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SUMMARY 

We address the problem of estimating the spherical-harmonic power spectrum of a sta- 
tistically isotropic scalar signal from noise-contaminated data on a region of the unit 
sphere. Three different methods of spectral estimation are considered: (i) the spheri- 
cal analogue of the one-dimensional (1-D) periodogram, (ii) the maximum likelihood 
method, and (iii) a spherical analogue of the 1-D niultitaper method. The periodogram 
exhibits strong spectral leakage, especially for small regions of area A ^ 47r, and is 
generally unsuitable for spherical spectral analysis applications, just as it is in 1-D. The 
maximum likelihood method is particularly useful in the case of nearly-whole-sphere 
coverage, A « 47r, and has been widely used in cosmology to estimate the spectrum of 
the cosmic microwave background radiation from spacecraft observations. The spherical 
multitaper method affords easy control over the fundamental trade-off between spectral 
resolution and variance, and is easily implemented regardless of the region size, requir- 
ing neither non-linear iteration nor large-scale matrix inversion. As a result, the method 
is ideally suited for most applications in geophysics, geodesy or planetary science, where 
the objective is to obtain a spatially localized estimate of the spectrum of a signal from 
noisy data within a pre-selected and typically small region. 

Key words: spectral analysis, spherical harmonics, statistical methods. 



1 INTRODUCTION 

^ , Problems involving the spectral analysis of data on the surface of a sphere arise in a variety of geodetic, geophysical, planetary, 
■ cosmological and other applications. In the vast majority of such applications the data are either inherently unavailable over 
the whole sphere, or the desired result is an estimate that is localized to a geographically limited portion thereof. In geodesy, 
statistical properties of gravity fields often need to be determined using data from an incompletely sampled sphere (e.g., 
Hwang 1993; AlberteUa et al. 1999; Pail et al. 2001; Swenson & Wahr 2002; Simons & Dahlen 2006). Similar problems arise 
in the study of (electro)magnetic anomalies in earth, planetary (e.g., Lesur 2006; Thebault et al. 2006) and even medical 
(e.g., Maniar & Mitra 2004; Chung et al. 2007) contexts. More specifically, in geophysics and planetary science, the local 
mechanical strength of the terrestrial or a planetary lithosphere can be inferred from the cross-spectrum of the surface 
topography and gravitational anomalies (e.g., McKenzie & Bowin 1976; Turcotte et al. 1981; Simons et al. 1997; Wieczorek & 
Simons 2005; Wieczorek 2007). Workers in astronomy and cosmology seek to estimate the spectrum of the pointwise function 
that characterizes the angular distribution of distant galaxies cataloged in sky surveys (e.g., Peebles 1973; Hauser & Peebles 
1973; Tegmark 1995). An even more important problem in cosmology is to estimate the spectrum of the cosmic microwave 
background or CMB radiation, either from ground-based temperature data coUected in a limited region of the sky or from 
spacecraft data that are contaminated by emission from our own galaxy and other bright non-cosmological radio sources 
(e.g., Gorski 1994; Bennett et al. 1996; Tegmark 1996, 1997; Tegmark et al. 1997; Bond et al. 1998; Oh et al. 1999; Wandelt 
et al. 2001; Hivon et al. 2002; Mortlock et al. 2002; Hinshaw et al. 2003; Efstathiou 2004). In this paper we consider the 
statistical problem of estimating the spherical-harmonic power spectrum of a noise-contaminated signal within a spatially 
localized region of a sphere. All of the methods that we discuss can easily be generalized to the multivariate case. 
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Figure 1. Geometry of the unit sphere n = {r : 
an arbitrary spacelimited region K = K\\J K'lVi ■ 



||r|| = 1}, showing, from left to right, colatitude < 6 < tt and longitude < ^ < 27r, 
•; an axisymmetric polar cap S < 0; and a double polar cap 9 < G and tt — < 5 < tt. 



2 PRELIMINARIES 

We denote points on the unit sphere n by r rather than the more commonly used f , preferring to reserve the circumflex 
to identify an estimate of a statistical variable. We use R to denote a region of within wfiich we have data from which 
we wish to extract a spatially localized spectral estimate; the region may consist of a number of unconnected subregions, 
R = R\\J Ri\J • • • , and it may have an irregularly shaped boundary, as shown in Fig. 1. We shall illustrate our results using 
two more regularly shaped regions, namely a polar cap of angular radius O and a pair of antipodal caps of common radius 
0, separated by an equatorial cut of width tt — 2G, as shown in the rigfitmost two panels of Fig. 1. An axisymmetric cap, 
which may be rotated to any desired location on the sphere, is an obvious initial choice for conducting localized spatiospectral 
analyses of planetary or geodetic data whereas an equatorial cut arises in the spectral analysis of spacecraft CMB temperature 
data, because of the need to mask foreground contamination from our own galactic plane. The surface area of the region R 
will be denoted by A. 



2.1 Spatial, pixel and spectral bases 

We shall find it convenient to switch back and forth among three different representations or bases which may be used to 
specify a given function on 0: 

(i) The familiar spatial basis in which a piccowiso contirmous function / is represented by its values /(r) at points r on Jl. 

(ii) The pixel basis in which the region R we wish to analyze is subdivided into equal-area pixels of solid angle Af2 = 4ttJ~^ . 
A function / is represented in the pixel basis by a J-dimensional column vector f = (/i /2 • • • fj)^, where fj = f{rj) is 
the value of / at pixel j, and J is the total number of pixels. Equal-area pixelization of a 2-D function /(r) on a portion R 
of O is analogous to the equispaced digitization of a finite 1-D time series /(t), <t <T. Integrals over the region R will be 
assumed to be approximated with sufficient accuracy by a Riemann sum over pixels: 



(1) 



Henceforth, in transforming between the spatial and pixel bases, we shall ignore the approximate nature of the equality 
in eq. (1). In cosmology, such an equal-area pixelization scheme is commonly used in the collection and analysis of CMB 
temperature data (e.g., Gorski et al. 2005); in the present paper we shall make extensive use of the pixel basis, even in the 
case that R is the whole sphere O, primarily because it enables an extremely succinct representation of expressions that would 
be much more unwieldy if expressed in the spatial basis. As a simple example we note that a double integral of the product 
of two symmetric functions over R can be written as 



F(r,r')F(r',r) dQ.dQ! = (An)^tr(FF) = (An)^tr(FF), 



(2) 



where F and F are symmetric matrices of dimension J x J with elements Fjji = F{rj,rji) and Fjji = F{rj,rji), and we have 
blithely replaced the symbol w by = as advertised. We shall consistently write pixel-basis column vectors and matrices using 
a bold, lower-case and upper-case, sans serif font, respectively, as above. 

(iii) The spectral basis in which a function / is represented in terms of its spherical harmonic expansion coefficients: 



Yimi^) where 



= / /(r)y;^(r)dO. 
Jn 



(3) 



The harmonics Yim(r) used in this paper are the complex surface spherical harmonics defined by Edmonds (1996), with 
properties that we review brieffy in the next subsection. An asterisk in eq. (3) and elsewhere in this paper denotes the 
complex conjugate. 
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2.2 Spherical harmonics 

Specifically, the functions llm(r) = Yim.{9,4>) are defined by the relations (e.g., Edmonds 1996; Dahlen & Tromp 1998) 
Yim {e,4>) = Xi^ {e) exp{im(l>) , (4) 
2Z + l\i/2 \{l-m"^'^^ 



X'mi0) = i-ir (^) 



{i + my. 

1 2xm/2 / d\ 2 ^ 



^(^(008 61), (5) 

^'UM) = ^(1-Mr-^^J (M^-iy, (6) 

where < 6 < tt is the colatitude and < </> < 27r is the longitude. The integer < / < oo is the angular degree of the 
spherical harmonic and — Z < m < / is its angular order. The function Pim{i-i) defined in eq. (6) is the associated Legendre 
function of degree I and order m. The choice of the multiplicative constants in equations (4)-(6) orthonormalizes the spherical 
harmonics on the unit sphere so that there are no \/47r factors in the spatial-to-spectral basis transformation (3): 



/ 

Jn 



YCm (r) Yi' m' (r) dfi = 5w S^m' ■ (7) 



The spherical harmonics Yim(r) are eigenfunctions of the Laplace-Beltrami operator, = 9^ + cotdde + {sm9)~^d^, 
with associated eigenvalues + 1). Harmonics of negative and positive order are related by Yi-rn{i^) = (^l)'"y;m(r). The 
/ 00 asymptotic wavenumber of a spherical harmonic of degree / is + l)]i/2 1 + 1/2 (Jeans 1923). A 2-D Dirac deha 
function on the sphere f2, with the replication property 



5(r,r'),f(r')df7' = /(r), (8) 
can be expressed as a spherical harmonic expansion in the form 

S{r, r') = ^ YUr) YUr') = ^ ^(2Z + 1) Pi(r • r'), (9) 

Im I 

where Pi{fi) = Pioifi) is the Legendre polynomial of degree I and the second equality is a consequence of the spherical harmonic 
addition theorem. A 1-D Dirac delta function can be expanded in terms of Legendre polynomials as 

^(m - m') = ^ 5^(2/ + l)Pi(M)fi(M')- (10) 

I 

In eqs (3), (9), (10) and throughout this paper we refrain from writing the limits of sums over spherical harmonic indices 
except in instances where we wish to be emphatic or it is essential. All spherical harmonic or spectral-basis sums without 
specifically designated limits will either be infinite, as in the case of the sums over degrees < / < oo above, or they will 
by limited naturally, e.g., by the restriction upon the orders — / < m < / or by the selection rules governing the Wigner 3-j 
symbols which we discuss next. 



2.3 Wigner 3-j and 6-j symbols 

We shall make frequent use of the well-known formula for the surface integral of a product of three spherical harmonics: 



/ 

Jn 



Yimir)Yp^{r)Yi,^,{r) dQ = 



i2l + l){2p + l){2l' + 1) 



1/2 



I p I' 





I p I' 

m q m' 



(11) 



where the arrays of integers arc Wigner 3-j symbols (Edmonds 1996; Messiah 2000). Both of the 3-j symbols in eq. (11) are 
zero except when (i) the bottom-row indices sum to zero, m + q + m' =0, and (ii) the top-row indices satisfy the triangle 
condition \l — l'\ < p < I + I' . The first symbol, with all zeroes in the bottom row, is non-zero only iil + p + 1' is even. A 
product of two spherical harmonics can be written as a sum of harmonics in the form 



V(m(r)Y'i'm' (r) 



E 

PI 



{2l + l){2p + l){2l' + 1) 



4lT 



1/2 



I p I' 





m 



P I' 
q m! 



The analogous formulas governing the Legendre polynomials Pi{p) are 



Pi{n)PMPi'Mdt-i = 2 



I p I' 




and Pi{n)Pv{i^) = Y,i^P+^) 



I 



I' 







(12) 



(13) 
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Two orthonormality relations governing the 3-j symbols are useful in what follows: 

(14) 



mm' 

provided the enclosed indices satisfy the triangle condition. The Wigncr 6-j symbol is a particular symmetric combination of 

six degree indices which arises in the quantum mechanical analysis of the coupling of three angular momenta; among a welter 
of formulas relating the 3-j and 6-j symbols, the most useful for our purposes are (Varshalovich et al. 1988; Messiah 2000) 

E,^.u+u'+p+v+v'+q f s e s' \ f u e' u' \ f s p u' \ f u p s' \ _ S^e'Sff / s e s' 1 , , 
, ' \t f t' f ^' J\t 1 -v' )\v -q t' J - 2e + l \u p u' j ^^^> 

i:(-ir'P.+.){: ; :;}(• ; •„)(: -„ »X' i -„)(; I i). (17) 

where the common array in curly braces is the 6-j symbol. Two simple special cases of the 3-j and 6-j symbols will be needed: 

and i , \ = -Sss'^uu'- (18) 



y v'2r+T \u p u \ y'(2s + l)(2w + l) 

Finally, we shall have occasion to use an asymptotic relation for the 3-j symbols, namely 

(2^+^)(o I o)'«2m[^^|-''^^/')]'"^[^'"-'|^'^/')]' ^''^ 

which is valid for Z w /' ^ p (Brussaaxd & Tolhoek 1957; Edmonds 1996). All of the degree and order indices in eqs (11)-(19) 
and throughout this paper are integers. 

Well-known recursion relations allow for the numerically stable computation of spherical harmonics (Libbrecht 1985; 

Dahlen & Tromp 1998; Masters & Richards-Dingcr 1998) and Wigncr 3-j and 6-j symbols (Schultcn & Gordon 1975; Luscombc 
& Luban 1998) to high degree and order. The numerous symmetry relations of the Wigner symbols can be exploited for efficient 
data base storage (Rasch & Yu 2003). 



2.4 Projection operator 

We use /^(r) to denote the restriction of a function /(r) defined everywhere on the sphere f2 to the region R, i.e., 
^ ^"^^"l otherwise. 

In the pixel basis restriction to the region R is accomplished with the aid of a projection operator: 

= Df where ° = ( ) ' "^^^^ 

In writing eqs (21) we have assumed that the entire sphere has been pixelized with those pixels located within R grouped 
together in the upper left corner, so that I is the identity operator within R. It is evident that = D and D = D^, as 
must be true for any (real) projection operator. In the spectral basis it is easily shown that the spherical harmonic expansion 
coefficients of /^(r) are given by 



flm = '^Dim,l'm'fl>m' where Am.i'm' = / i^im(r)V(/ 



,{r)dn. (22) 



The quantities Di^y^i are the elements of a spectral-basis projection operator, with properties analogous to those of the 
pixel-basis projector D, namely 

Dlm,pqDpq,l'm' = Dlm,l'm' and Di^ i,^,=Dii^i i^. (23) 

pq 

The first of eqs (23) can be verified by using the definition (22) of Di^^ii^i together with the representation (8)-(9) of the Dirac 
delta function. Neither the pixel-basis projection operator D nor the infinite-dimensional spectral-basis projection operator 
Dim,i'm' is invertible, except in the trivial case of projection onto the whole sphere, R = Q,. 
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2.5 Signal, noise and data 

Wo assume that the real-valued spatial-basis signal of interest, which we denote by 

s(r) = ^S(^Y',^(r), (24) 

Im 

is a realization of a zero-mean, Gaussian, isotropic, random process, with spherical harmonic coefficients sim. satisfying 
{sim.)=0 and {slmSl'm') = Si SwSmm' , (25) 

where the angle brackets denote an average over realizations. Such a stochastic signal is completely characterized by its 
angular power spectrum Si, < I < oo. The second of eqs (25) stipulates that the covariance of the signal is diagonal in the 
spectral representation. Wc denote the signal covariance matrix in the pixel basis by S = (ss^), where s = (si S2 • • • Sj)^ 
and Sj = s(rj). To evaluate S we note that 

(s(rj)s(ry)) = ^^{simst'm'}Yim{rj)Yir^,{rj,) 

Im I'm' 

= ^S,ll^(r,)Y-;„(r,.,) 

Im 

It is convenient in what follows to introduce the J x J symmetric matrix P; with elements 

= E yimiri)YUrr) = (^) Pi{r, ■ v,,). (27) 

m 

In particular, the pixel-basis covariance matrix may be written using this notation in the succinct form 

S = Y^SiPi. (28) 
I 

Eq. (28) shows that the signal covariance is not diagonal in the pixel representation. The total power of the signal integrated 
over the whole sphere is 

Stot= f {s\v))dn = Y^{2l + l)Su (29) 

Jn , 

and the power contained within the region R of area A < 47r is 

5tot = j {s'(r)) da = AntrS = ^Stot- (30) 

In general the signal s(r) in eq. (24) is contaminated by random measurement noise, 
^(r) = E '^imYlmir), (31) 

Im 

which we will also assume to be zero-mean, Gaussian and isotropic, 

(n;^) = and {ni^n'^>^') = Ni Su'Smm' , (32) 

with a known angular power spectrum Ni,0 < I < 00. The covariance of the noise in the pixel basis is given by the analogue 
of eq. (28), namely N = (nn^) = Ni P;. The simplest possible case is that of white noise, Ni = N — Afla^; the pixel-basis 
noise covariance then reduces to l\l = I, where a is the root- mean-square measurement noise per pixel and I is the J x J 
identity, by virtue of the pointwise relation 

^P; = (An)-M. (33) 

Eq. (33) is the pixel-basis analogue of the spatial-basis representation (8)-(9) of the Dirac delta function. The covariance of 
white noise is diagonal in both the spectral and pixel bases. 

The measured data, which we denote by d(r) or d = (di d2 • • • dj)'^ , consist of the signal plus the noise: 

d(r) = s(r) -l-n(r) or d = s -|- n. (34) 

We assume that the signal and noise are uncorrelated; i.e. (ns^) = (sn^) = 0. The pixel-basis covariance matrix of the data 
under these assumptions is 

C = (dd'') = (ss"') + (nn") =S + N = ^(S', + Ni) Pi. (35) 
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It is noteworthy that there are two different types of stochastic averaging going on in the above discussion: (sj^S;*^;) or (ss^) 
is planetary or cosmic averaging over aU reahzations of the signal s(r) or s, whereas (?^^m^^*'m') or (m^) is averaging over all 
realizations of the measurement noise n(r) or n. In what follows we will use a single pair of angle brackets to represent both 

averages: (•) = ((•)signal)noisc = ((')noisc)signal- 

In practice the CMB temperature data d = s + n in a cosmological experiment are convolved with the beam response of 
the measurement antenna or antennae, which must be determined independently. Harmonic degrees I whose angular scale is 
less than the finite aperture of the beam cannot be resolved; for illustrative purposes in section 10 we adopt a highly idealized 
noise model that accounts for this effect, namely 

Ni=Ana'e^p(^'^^y (36) 

where ^fwhm is the full width at half-maximum of the beam, which is assumed to be Gaussian (Knox 1995). For moderate 

angular degrees the noise (36) is white but for the unresolvable degrees, I ^ V81n2/6fwhm, it increases exponentially. Two 
other complications that arise in real-world cosmological applications will be ignored: (i) In general some pixels are sampled 

— 1/2 

more frequently than others; in that case, the constant noise per pixel a must be replaced by fJofj , where i^j is the number 
of observations of sample j. The resulting noise covariamce is then non-diagonal in both the spectral and pixel bases, (ii) CMB 
temperature data are generally collected in a variety of microwave bands, requiring consideration of the cross-covariance Cxy 
between different wavelengths A and A'. 



3 STATEMENT OF THE PROBLEM 

We are now in a position to give a formal statement of the problem that will be addressed in this paper: given data d = s -I- n 
over a region R of the sphere Q and given the noise covariance N, estimate the spectrum S*;, < Z < oo, of the signal. This is 
the 2-D spherical analogue of the more familiar problem of estimating the power spectrum S{ui) of a 1-D time series, given 
noise-contaminated data d{t) = s{t) +n{t) over a finite time interval < t < T. The 1-D spectral estimation problem has been 
extremely well studied and has spawned a substantial literature (e.g., Thomson 1982, 1990; Haykin 1991; Mullis & Scharf 
1991; Percival & Walden 1993). We shall compare three different spectral estimation methods: (i) the spherical analogue of 
the classical periodogram, which is unsatisfactory for the same strong spectral leakage reasons as in 1-D; (ii) the max;imum 
likelihood method, which has been developed and widely applied in CMB cosmology (e.g.. Bond et al. 1998; Oh et al. 1999; 
Hinshaw et al. 2003); and (iii) a spherical analogue of the 1-D multitaper method (Wieczorek & Simons 2005; Simons et al. 
2006; Simons & Dahlen 2006; Wieczorek & Simons 2007). 



4 WHOLE-SPHERE DATA 



It is instructive to first consider the case in which usable data d 
obvious choice for the spectral estimator in that case is 

2 

d(r)r4(r)dO 



AWS _ 



1 



2/-I-1 ^ 



N,. 



s + n are available over the whole sphere, i.e., R = Q. An 



(37) 



where the first term is the conventional definition of the degree-/ power of the data d(r) and — as we shall show momentarily — 
the subtracted constant A'^; corrects the estimate for the bias due to noise. In the pixel basis eq. (37) is rewritten in the form 

Sr = ^[d^Pzd-tr(NPO]. (38) 

The equivalence of eqs (37) and (38) can be confirmed with the aid of the whole-sphere double-integral identity 

tr(PiP,0 = (AO)-2(2/ + l)5„,. (39) 

To verify the relation (39) it suffices to substitute the definition (27), transform from the pixel to the spatial basis, and utilize 
the spherical harmonic orthonormality relation (7). The superscript WS identifies the equivalent expressions (37)-(38) as the 
whole-sphere estimator, Sj^^ is said to be a quadratic estimator because it is quadratic in the data d. Every spectral estimator 
that we shall consider subsequently, in the more general case R ^ Q, has the same general form as eqs (37)-(38): a first term 
that is quadratic in d and a second, subtracted constant term that corrects for the bias due to noise. 
The expected value of the whole-sphere estimator Sj^^ is 

(AO)'' 



/ AWS\ 

/ — 



21 + 1 

21 + 1 
(A0)2 



[tr(CPi)-tr(NPi)] 
tr(SPO 



noise bias cancels 



21 + 

= Si, 



(40) 
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where the first equation follows from (d^Pjd) = tr(CP;) through eq. (35). The result (40) shows that, when averaged over 
infinitely many realizations, the whole-sphere expressions (37)-(38) will return an estimate that will coincide exactly with the 
true spectrum: (Sj^^) = Si. Such an estimator is said to be unbiased. 

We denote the covariance of two whole-sphere estimates Sj^^ and at diflferent angular degrees I and I' by 

Sii' =COV[Sl ,6;/ j, (41) 

where as usual by cov(d, d') we mean 

cov(d,d') = {{d-{d)){d' - (d'))) = {dd') - {d){d'). (42) 

To compute the covariance of a quadratic estimator such as (37)-(38) we make use of an identity due to Isserlis (1916), 

cov(did2, ^3^4) = cov(di, da) cov(d2, 1^4) + cov(di, 0(4) cov(d2, da), (43) 

which is valid for any four scalar Gaussian random variables di, d2, ds and d4. Using eq. (43) and the symmetry of the matrices 
Pi, Pi' and C to reduce the expression cov(d^P;d, d^Pj/d), it is straightforward to show that 

where the factor of two arises because the two terms on the right side of the Isserlis identity are in this case identical. To 
evaluate the scalar quantity tr(CPiCP(/) wc substitute the representation (35) of the data covariance matrix C, and transform 
the result into a fourfold integral over the sphere f2 in the spatial basis. Spherical harmonic orthonormality (7) obligingly 
eliminates almost everything in sight, leaving the simple result 

Y:T = ^{Si+NifSu,. (45) 

The Kronecker delta Sn/ in eq. (45) is an indication that whole-sphere estimates §^^,3^^ of the spectrum Si,Sii are uncor- 
related as well as unbiased. 

The formula for the variance of an estimate, 

.s,iSn = ^r = ^iSi+Ni)\ (46) 

can be understood on the basis of elementary statistical considerations (Knox 1995). The estimate Sj^^ in eq. (37) can be 
regarded as a linear combination of 2/ + 1 samples of the power \dim\'^,—l < m < I, where dim is drawn from a Gaussian 
distribution with variance Si + Ni. The resulting statistic has a chi-squared distribution with a variance equal to twice the 
squared variance of the underlying Gaussian distribution divided by the number of samples (e.g., Bcndat & Piersol 2000); this 
accounts for the factors of 2/(2Z + 1) and {Si + Ni)^ in eq. (46). It may seem surprising that vax(S'j^^) > even in the absence 
of measurement noise, Ni = 0; however, there is always a sampling variance when drawing from a random distribution no 
matter how precisely each sample is measured. This noise-free planetary or cosmic variance sets a fundamental limit on the 
uncertainty of a spectral estimate that cannot be reduced by experimental improvements. 

In applications where we do not have any a priori knowledge about the statistics of the noise n, we have no choice but to 
omit the terms Ni and tr(NP() in eqs (37)-(38). The estimate Sf^ is then biased by the noise, {Sf^^) =Si + Ni; nevertheless, 
the formula (45) for the covariance remains valid. Similar remarks apply to the other estimators that we shall consider in the 
more general case R^Q,. We shall employ the whole-sphere variance var(S';^^) of eq. (46) as a "gold standard" of comparison 
for these other estimators. 



5 CUT-SPHERE DATA: THE PERIODOGRAM 

Suppose now that we only have (or more commonly in geophysics we only wish to consider) data d(r) or d = (di d2 • • • dj) ^ 
over a portion R of the sphere O, with surface area A < 4n. 

5.1 Boxcar window function 

It is convenient in this case to regard the data d(r) as having been multiplied by a unit-valued boxcar window function, 
Kr) = j:bMr) = {l 'il^J^^ (47) 

pq 

confined to the region R. The power spectrum of the boxcar window (47) is 

^-=2^El''-l'- (48) 
1 
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Using a classical Legendre integral formula due to Byerly (1893) it can be shown that eq. (48) reduces, in the case of a single 
axisymmetric polar cap of angular radius and a double polar cap complementary to an equatorial cut of width tt — 2G, to 

B;''" = 7r(2p+ 1)-^ [Pp_i(cose) - Pp+i(cose)]^ , (49) 

BT-l ^^^^TJ" (50) 

' 10 il p IS odd, ^ ' 

where P-i{n) = 1. As a special case of eqs (49) (50), the power of the p = or dc component in these two instances is 
Sg'^P = 7r(l - cos e)^ = .4-7(47r), Bg"' = 4B^^''^ = A^/{4tt). In fact, the dc power of any boxcar 6(r), no matter how irregularly 
shaped, is Bq = A^/{4tt). 

The whole-sphere identity (39) is generalized in the case R ^ Q to 

tr(P,P(0 = {An)-' ^ \Di^,Vm'\^ , (51) 

mm' 

where the quantities 

Di^,Vm' ^ I Y:^{r)Yi,^,{v)dn (52) 

JR 

are the matrix elements of the spectral-basis projection operator defined in eq. (22). We can express this in terms of the power 
spectral coefficients Bp by first using the boxcar (47) to rewrite eq. (52) as an integral over the whole sphere O, and then 
making use of the formula for integrating a product of three spherical harmonics, eq. (11): 



tr(P,P,0 = (An)-'^ 



^hpq / Yil,{r)Ypq{r)Yi,m'{r)dn 



^''tyiy^^ E E V(2P + 1)(2P' + 1)6., K'. 

pq p'ql 

I p l'\f I p' l'\^f I P \ f I P' I' \ (ro. 

J \ J m q m' J \ m q' m' J ' ^ ' 

The 3-j orthonormality relation (15) can be used to reduce the final double sum in eq. (53), leading to the simple result 

p 

In the limit ^ — > 47r of whole-sphere coverage. Bp — » 4n5po and the 3-j symbol with p = is given by the first of eqs (18), so 
that eq. (54) reduces to the result (39) as expected. 

Fig. 2 shows the normalized boxcar power spectra Bp/ Bq associated with axisymmetric single and double polar caps of 
various angular radii. For a given radius O, eqs (49)-(50) show that (Bp/Bo)*^"* has a shape identical to [Bp / B^Y''^'^ , but with 
the odd degrees removed; to avoid duplication, we illustrate the spectra for single caps of radii = 10°, 20°, 30° and double 
caps of common radii = 60° , 70° , 80° . The scales along the top of each plot show the number of asymptotic wavelengths 
that just fit within cither the single cap or one of the two double caps; one perfectly fitting wavelength corresponds to a 
spherical harmonic of degree pe given by [pe(pe + 1)]^^^ = 18O°/0, two wavelengths to a degree pe/2 ^ 2pe, and so on. A 
rough rule-of-thumb is that Bp <C So (say 10-20 dB down from the maximum) for all harmonics that are large enough to 
easily accommodate at least one or two wavelengths within a cap, i.e., for all p > {1-2} x pe- 

Fig. 3 shows a contour plot of the normalized power Bp /Bo for spherical harmonic degrees < p < 100 and single caps 
(left) and double caps (right) of radii 0° < < 90°. A double cap of common radius = 90° covers the whole sphere and has 
power Bp = inSpo- The curves labeled {1-5} x are isolines of the functions [p{p+ 1)]^^^ = {l-5}x (18O°/0), which correspond 
to the specified number of asymptotic wavelengths just fitting within a single polar cap. These isolines roughly coincide with 
the {1 5}x(— 10 dB) contours of the power Bp/Bo, respectively, confirming the conclusion inferred from Fig. 2 that Bp <^ Bq 
for all spherical harmonic degrees p that are able to comfortably fit one or two wavelengths within either a single or double 
cap of arbitrary radius 0° < < 90° . Sums involving Bp such as eq. (54) converge relatively rapidly as a result of this strong 
decay of the high-degree boxcar power. 



5.2 Periodogram estimator 

A naive estimator of the signal power Si in the case His the spherical analogue of the periodogram, introduced into 1-D 
time series analysis by Schuster (1898): 

2 

-Y^KwNi,, (55) 

v 
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degree p degree p 

Figure 2. Bar plots of the normalized power Bp/Bo versus angular degree p for various boxcar windows b(r) as defined by eq. (47). 
Inset schematic thumbnails show the shapes of the regions considered: axisymmetric polar caps of angular radii = 10° , 20° , 30° (left) 
and double polar caps of common radii = 60° , 70° , 80° (right). Abscissa in all cases is logarithmic, measured in dB = 10 log]^Q(i3p/-Bo)- 
Topmost scales show the number of asymptotic wavelengths that just fit within either a single cap (left) or one of the two double polar 
caps (right) . The odd-degree values of the double-cap power Bp are all identically zero for reasons of symmetry; see eq. (50) . 



where we have introduced the matrix 

mm' ^ ^ p ^ ^ 

The subtracted term in eq. (55) is simply a known constant which — as we will show — corrects the estimate for the bias 
due to noise. In the pixel basis eqs (55)-(56) become 

the only difference with the whole-sphere estimator (38) being the leading factor of An /A and the fact that the vector 
and matrix multiplications represent spatial-basis integrations over the region R rather than over the whole sphere f2. The 
superscript SP identifies eqs (55) and (57) as the spherical periodogram estimator. When A = Att, Kui = Sui . 



5.3 Leakage bias 

To find the expected value of Sf^ we proceed just as in reducing eq. (40): 
= (f)tS?WCP.,-MNP.)l 



A J 21 + 1 
47r\ (Af7)" 



tr(SPi) noise bias cancels 



- (f)t^E*MP.P,0 

i' 

= (58) 
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single cap radius @ double cap radius @ 



-60 -50 -40 -30 -20 -10 
boxcar power B^/B^ (dB) 

Figure 3. Grcy-scalc contour plots of the normalized boxcar power Bp /Bo, measured in dB, versus angular degree < p < 100, measured 
downward on the vertical axis, and single or double polar cap radius 0° < < 90°, on the horizontal axis. Isolines + 1)]'^'^^ = 
{1— 5}x(180°/©) designate the number {1-5} of asymptotic wavelengths that just fit within a single polar cap. Thumbnail insets again 
show the shapes of the regions considered. The double-cap power is "striped" because -Bp"* = for odd p. 



where we used the definition (56) of Kui to obtain the final equality. The calculation in eq. (58) confirms the equivalence 
of eqs (55) and (57), and shows that, unlike the whole-sphere estimator S'^^, the periodogram is biased, inasmuch as 
(Sf^) ^ Si. The source of this bias is leakage from the power in neighboring spherical harmonic degrees I' = I ±1,1 ±2, . . .. 
We shall refer to the matrix Kui, introduced in a cosmological context by Peebles (1973), Hauser & Peebles (1973) and Hivon 
et al. (2002) , as the periodogram coupling matrix, since it governs the extent to which an estimate Sf^ of Si is influenced by 
this spectral leakage. The 3-j identity 



(59) 



which is a special case of the orthonormality relation (14), guarantees that every row of Kui sums to unity, 



= i + 1) Bp = i / b\r) dQ = 1, (60) 



so that there is no leakage bias only in the case of a perfectly white spectrum: 

{Sf^} =S if Si=S. (61) 

This is in fact why we introduced the factor of Att/A in eqs (55) and (57): to ensure the desirable result (61). For pixelized 
measurements with a white noise spectrum, Ni = N = AQa^, the subtracted noise-bias correction term in eq. (55) reduces 
to N = AQcr^, as in eq. (37). In the whole-sphere limit, Bp 'iivSpo so that Kn/ — > Su' and (Sf^) — » Si, as expected. 
In the opposite limit of a connected, infinitesimally small region, 

A^O and ^(2Z -h 1) oo with (^-^^ ^(2/ -|- 1) = 1 held flxed, (62) 
I I 

the inverse-area-scaled boxcar A^^b{r) tends to a Dirac delta function (5(r, R), where R is the pointwise location of the 
region R, so that the boxcar power is white: Bp A'^/{4tv). The spectral-basis projector (52) tends in the same limit to 
Dim,i'm' -^^imi^) ^I'm'i^)' ^^^^ the coupliug matrix (56) reduces to 

Kw -» ^(2«' -I- 1) for all < Z < oo. (63) 

Eq. (63) highlights the fact that there is strong coupling among all spherical harmonic degrees 1,1' in the limit (62); in 
fact, the expected value of the periodogram estimate is then simply the total signal power contained within the infinitesimal 
measurement region: (Sf^) — » The fixity constraint upon the limit (62) guarantees that the rows of the coupling 

matrix (63) sum to unity, in accordance with eq. (60). 
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degree 1' degree 1' 

Figure 4. Bar plots of the pcriodogram coupling matrix 100 X Kur for single polar caps of radii O = 10° , 20° , 30° (left) and double 
caps of common radii O = 60°, 70°, 80° (right). The tick marks are at /' = 0, 20, 40, 60, 80, 100 on every offset abscissa; the target degrees 
I = 0,20,40,60 are indicated on the right. Numbers on top are the maximum diagonal value 100 X Kn for every target degree I. The 
double-cap matrix is alternating, Kn/ = if |( — Z'| odd, since the 3-j symbols are zero whenever I + p + 1' is odd and S^"* = if p odd. 



In Fig. 4 we illustrate the periodogram coupling matrix Kui for the same single polar caps of radii 6 = 10°, 20°, 30° and 
double polar caps of common radii G = 60°, 70°, 80° as in Figs. 2 and 3. In particular, for various values of the target angular 
degree / = 0, 20, 40, 60, we exhibit the variation of Kui as a function of the column index I'; this format highlights the spectral 
leakage that is the source of the bias described by eq. (58). The quantity we actually plot is 100 x Kui, so that the height 
of each bar reflects the percent leakage of the power at degree I' into the periodogram estimate Sf^ , in accordance with the 
constraint that all of the bars must sum to 100 percent, by virtue of eq. (60). At small target degrees I m the variation of 
Kill with I' is influenced by the triangle condition that applies to the 3-j symbols in eq. (56), but in the limit Z — »■ oo the 
coupling matrix takes on a universal shape that is approximately described by 

* (t) [X,|,_,,|(7r/2)]' , (64) 

p 

as a consequence of the 3-j asymptotic relation (19); this satisfies the constraint eq. (60). This tendency for Kui to maintain 
its shape and just translate to the next large target degree is apparent in all of the plots. 

It is evident from both eq. (56) and the plots of Kw in Fig. 4 that a small measurement region, with A <C 4n, gives rise 
to much more extensive coupling and broadband spectral leakage than a large region, with ^ ~ 47r. We quantify this relation 
between the extent of the coupling and the size of the region R in Fig. 5, in which we plot the large-/ limits of the matrix 
Kill in eq. (56) as a function of the offset from the target degree for the same single-cap and double-cap regions as in Fig. 4. 
The common abscissa in all plots is measured in asymptotic wavelengths, — 3 < J/ < 3, defined by |Z' — l\ = Pe/H, or indeed 
I' — I K, i/pQ where [pe(pe + 1)]^^^ = 180° /0, and delineated along the top; the /' — / scales along the bottom vary depending 
upon the cap size Q. It is clear from this format that Kui is always substantially less than its peak diagonal value Ku, so 
that the coupling and spectral leakage are weak, whenever |Z' — Z| > {1-2} x pe- The extent of the periodogram coupling thus 
scales directly with the radius of a single or double polar cap. The resulting broadband character of the spectral leakage for 
small regions, with A <C 47r, is a highly undesirable feature of the periodogram, which argues against its use in applications. 
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Figure 5. Large-i limits of the periodogram coupling matrix 100 x Kni for single polar caps of radii = 10°, 20°, 30° (left) and double 
caps of radii = 60°, 70°, 80° (right). The common abscissa is the offset from the target angular degree, measured in asymptotic 
wavelengths, Z' — Z ~ fpQ . The limiting shapes were found empirically by increasing I until the plots no longer changed visibly. The exact 
coupling matrix (56) is asymmetric because of the leading factor of 21' + 1; the slight left-right asymmetry visible here is not retained in 
the asymptotic result (64). Small numbers in upper left corner give the percent coupling outside the boundaries —3 < < 3 of each plot. 



5.4 Periodogram covariance 



Making use of the Isserlis identity (43) we find tliat tlie covariance of two periodogram estimates Sf^ and Sf,^ at different 
degrees I and I' is given by a pixel-basis formula very similar to eq. (44), 



E;;, = COV(6i ,b, 



SP qSP\ 



{2l + l){2l' + 1) 



tr(CPiCPi: 



(65) 



with the important difference that tr(CP(CPi/) now represents a fourfold integral over the region R rather than over the whole 
sphere ft. Inserting the representation (35) of the data covariance matrix C and transforming to the spatial basis, we obtain 
the result 

2{4Tv/Af V- ■ 



vSP 
2-'W — 



{2l + l){2l' + 



T)^ 



pq 



{Sp + Np)Dlm,pqDpq^i''r. 



(66) 



which reduces to eq. (45) in the limit of whole-sphere data coverage, when Di^y^i = &ii'5mm' ■ Using the boxcar function 
6(r) to rewrite Dim,i'm' as an integral over the whole sphere f2 as in our reduction of eq. (51) we can express the covariance 
of a periodogram spectral estimate in terms of Wigner 3-j symbols: 



vSP 



-y 



E 

pq 



{2p + l)(Sp + N^)J2J2 V(2s + l)(2s' + b:,t' 



I p s 




/' p s' 




I 

m 



p s 
q t 



p s 
q t' 



(67) 



Eqs (66) and (67) are exact and show that every element of the periodogram covariance is non-negative: Ej;/ > 0, with 
equality prevailing only for I ^ I' in the limit of whole-sphere coverage, A = 47r. We shall obtain a more palatable approximate 
expression for Sf;^, valid for a moderately colored spectrum, in subsection 8.1. 
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5.5 Deconvolved periodogram 

In principle it is possible to eliminate the leakage bias in the periodogram estimate Sf^ by numerical inversion of the coupling 
matrix Kn^ . The expected value of the deconvolved periodogram estimator, defined by 

S?- = J2Ku,'sr, (68) 

I' 

is clearly (Sf^) = Si. The corresponding covariance is given by the usual formula for the covariance of a linear combination 
of estimates (Menke 1989): 

=cov(^P^^^) =E^r/S^p"'^y? (69) 
pp' 

where K~,^ = K^^,. In practice the deconvolution (68) is only feasible when the region R covers most of the sphere, A « 4n; 
for any region whose area A is significantly smaller than 4ir, the periodogram coupling matrix (56) will be too ill-conditioned 
to be invertible. 



6 MAXIMUM LIKELIHOOD ESTIMATION 

In this section we review the maximum likelihood method of spectral estimation, which has been developed and applied 
by a large number of cosmological investigators to CMB temperature data from ground-based surveys as well as two space 
missions: the Cosmic Background Explorer (COBE) satellite and the Wilkinson Microwave Anisotropy Project (WMAP). Our 
discussion draws heavily upon the analyses by Tegmark (1997), Tegmark et al. (1997), Bond et al. (1998), Oh et al. (1999) 
and Hinshaw et al. (2003) 



6.1 Likelihood function 

The starting point of the analysis is the likelihood £.{Si,<i) that one will observe the pixel-basis data d = (di •• • dj)^ 
given the spectrum Si. We model this likelihood as Gaussian: 

exp(-id'^C-'d) 

£iSi,d)= ^ , (70) 

where is the inverse of the data covariance matrix defined in cq. (35), C~^C = CC~^ — I, and J is the total number 
of observational pixels as before. The notation is intended to imply that £,{Si,d) depends upon all of the spectral values 
Si,0 < I < 00 ; the maximum likelihood estimator is the spectrum Si that maximizes the multivariate Gaussian likelihood 
function (70) for measured data d. 

Maximization of jC{Si,d) is equivalent to minimization of the logarithmic likelihood 

L(S'(,d) = -21n£(S(,d) =ln(detC)-|-d'^C"^d-|- Jln(27r). (71) 

To minimize L(Si,d) we differentiate with respect to the unknowns Si using the identity In(detC) = tr(lnC) and 

dC „ aC-l r-inr-i ^(InC) 

dSi-^'' -dsT^"^ ' ~dsr'^ ' ' 

The first equality in eq. (72) follows from eq. (35), the others axe the result of matrix identities. The resulting minimization 
condition is 

1^ = -d^C-^P,C-^d + tr(C-^Pi) =0. (73) 
The ensemble average of eq. (73) is 

(g) = -tr(C-ipO+tr(C-pO=0, (74) 

verifying that the maximum likelihood estimate is correct on average in the sense that the average slope (dL/dSi) is zero at 
the point corresponding to the true spectrum Si. The curvature of the logarithmic likelihood function L{Si,d) is 

= d^C-iPiC-^P;-C-^d + d^C-^Pi,C-^PiC-'d - tr (C-'PiC-'Pi,) . (75) 
In the vicinity of the minimum we can expand L{Si,d) in a Taylor series: 

Lisi + ssi,d) = Lisi, d) + E (g) -^^^ + ^ E (afe;) ''^' + -- (76) 
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The quantities d'^L/dSi dSi, are the elements of the Hessian of the logarithmic likelihood function; likewise, we shall write 
{d^L/dSi dSii)~^ to denote the elements of its inverse. Ignoring the higher-order terms •••in eq. (76) we can write the 
minimization condition (73) in the form 

''•-i:{l£kT (-il) (sH;)" Kc-'P„c-d-.,(c-P.,)] . (77) 

Eq. (77) is the classical Newton-Raphson iterative algorithm for the minimization of L{Si,d). Starting with an initial guess 
for the spectrum Si the method uses eq. (77) to find 5 Si, updates the spectrum Si —>■ Si + 5 Si, re-evaluates the right side, 
and so on until convergence, 5 Si — » 0, is attained (see, e.g., Strang 1986; Press et al. 1992). 



6.2 Quadratic estimator 

For large data vectors d computation of the logarithmic likelihood curvature (75) is generally prohibitive and it is customary 
to replace ^{d'^L/dSi dSi>) by its ensemble average, which is known as the Fisher matrix: 



itr(C-^P,C-iPr). (78) 



""' ~ 2\dSi dSi, / ~ 2 

Note that like the curvature (75) itself the Fisher matrix (78) is symmetric, Fui = Fin, and positive definite. Upon substituting 
iF~/ for the inverse Hessian (JfiL/dSi dSi/)~^ in eq. (77), we obtain a Newton-Raphson algorithm that is computationally 
more tractable, and guaranteed to converge (albeit by a different iteration path) to the same local minimum: 

SSi = lY.^u' [d^C-^P,,C-^d-tr(C-^P,,)] . (79) 
I' 

The second term in brackets in eq. (79) can be manipulated as follows: 

tr(C-ip;/) =tr(C-^P,,C-iC) = ^tr(C-iP;,C-ip„) (S^ -h Ar„) = 2 ^ F,,„(S'„ -h 7V„). (80) 

n n 

This enables us to rewrite the iteration (79) in the form 

Si+5Si = ^^Fu^ [d^C-iPrC-id-tr(C-ip,/C-iN)] . (81) 
(' 

In particular, at the minimum, where SSi = 0, the minimum conditions (73) are satisfied and eq. (81) reduces to 

5^ = d^Z,d - tr(NZO, (82) 

where we have defined a new symmetric matrix, 

^^ = ^J2^u"{C-'Pi,C-'). (83) 
I' 

The superscript ML designates Sf^^ as the maximum likelihood estimator. Eq. (82) is quadratic in the data d and has the 
same form as the whole-sphere and periodogram estimators and Sf^ , but with an important difference: the right sides 
of eqs (38) and (57) are independent of the spectrum Si whereas the matrix Z; in eq. (83) depends upon St. In fact, eq. (82) 
can be regarded as a fixed-point equation of the form Sf^^ = /(d, Sf^^), where the right side exhibits a quadratic dependence 
upon d but a more general dependence upon the unknown spectral estimates Sf^^, < Z < oo. Maximum likelihood estimation 
is inherently non-linear, requiring iteration to converge to the local minimum Sf^^. 



6.3 Mean and covar lance 

The maximum likelihood method yields an unbiased estimate of the spectrum inasmuch as 
(5D = tr(CZz)-tr(NZ() 

= tr(SZ,) noise bias cancels 

= i^F,7/^5,tr(C-ip,,C-ip,) 

V p 
!' p 

= Si. (84) 
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Using the Isscrlis identity (43) to compute the covariance of two estimates Sf^^ and S^^, we find that 
) 

2tr(CZiCZ,/) 



-,ML /AML rrML\ 



\ p p' / 

p p' 

= X] ^ X] ^'~p' ^^'^ 



p p' 

= F-,\ (85) 

The calculation in eq. (85) shows that the maximum likelihood covariance Sjf,'" is the inverse F~,^ of the ubiquitous Fisher 
matrix (78). The method depends upon our ability to invert Fip and, as we shall elaborate in subsection 6.6, this is only 
numerically feasible in the case of nearly- whole-sphere coverage, A m 4tt. 



6.4 The Fisher matrix 

Pixel-basis computation of the Fisher matrix Fni = i tr(C~^P;C~^P;') requires numerical inversion of the J x J covariance 
matrix C. Transforming to the spatial basis, we can instead write the definition (78) in terms of the inverse data covariance 
function C7~^(r, r') equivalent to the pixel-basis inverse (An)-^C"^ in the form 

^«' = ^IIl^f".«'-'l% (86) 

mm' 

where 

(87) 



Vim,i'm'= jj YiUr)C-'{r,r')Yi,^,{r')dQda'. 



Among other things, eq. (86) shows that every element of the Fisher matrix is non-negative: Fni > 0. To compute the matrix 
elements (87) in the absence of an explicit expression for C"^(r,r') in the case R ^ Q we can find the auxiliary spacelimited 
function 



-'^ Im 



by solving the spatial-basis integral equation 

/ C(r,r')W(r')dn' = y,,^,(r), v € R, (89) 



I R 

where 



C(r, r') = J2(Sp + Np) Yp^ir) y;,(r') = ^ ^(2p + l)(5p + Np) Pp{v ■ v'). (90) 



47r 

pq p 

Alternatively, we can transform eq. (89) to the spectral basis and solve 

X^ 5^'^''"'^'^'^^ ^P^F>pq,stV,t,l'm' = Dlm,l'm'- (91) 

st pq 

In the case of an axisymmetric region such as a polar cap or equatorial cut, the spatial-basis and spectral-basis inverse 
problems (89) and (91) can be decomposed into a series of simpler problems, one for each fixed, non-negative order m; this 
axisymmetric reduction is straightforward and will not be detailed here. 

In the limiting case of whole-sphere coverage, R = Ct, the pixel-basis covariance matrix (35) can be inverted analytically, 
C"^ = (Af2)2 + Niy^Pi, and the Fisher matrix (78) reduces to 

Fu' = ^{2l + l)iSi + Ni)-^Sw, (92) 

where we have used the whole-sphere identity (39). The result (92) can also be obtained from eqs (86) and (91) by recalling 
that Dim.i'rn' = (^n't^mm' H R ~ ft- In fact, the maximum likelihood estimate (82) coincides in this limiting case with the 
whole-sphere estimate (37), = 5,^^, and the covariance (85) reduces to Ejf,^ = F"/ = 2{2l + 1)"^ {Si + Nif Sw , in 
agreement with eq. (45), as expected. We give an explicit approximate formula that generalizes eq. (92) to the case of a region 
R^ in subsection 8.2. 
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6.5 Cramer-Rao lite 

Maximum likelihood estimation is the method of choice in a wide variety of statistical applications, including CMB cosmology. 
In large part this popularity is due to a powerful theorem due to Fisher, Cramer and Rao, which guarantees that the maximum 
likelihood method yields the best unbiased estimator in the sense that it has lower variance than any other estimate; i.e., in 
the present spherical spectral estimation problem, 

var(S'l^'^) = F(7^ < var(S'i) for any Si satisfying (Si) = Si. (93) 

A general statement and proof of this so-called Cramer-Rao inequality is daunting (see, e.g., Kendall & Stuart 1969); however, 
it is straightforward to prove the limited result (93) if we confine ourselves to the class of quadratic estimators, of the form 

S,=drZid-tr{NZ,), (94) 

where the second term corrects for the bias due to noise as usual, and where the symmetric matrix Z; remains to be determined. 
The ensemble average of eq. (94) is 

(Si) =Y,Zii'Si, where = tr(Z;P,0, (95) 

I' 

so that the condition that there be no leakage bias, i.,e., (Si) = Si, is that Zu, = Su,; and the covariance between two estimates 
of the form (94), by another application of the Isserlis identity (43), is 

T,ii, =cov {Si, Si,) =2tr(CZ,CZ;,)- (96) 

To find the minimum- variance, unbiased quadratic estimator we therefore seek to minimize var(Si) = 2tr(CZ(CZj) subject to 
the constraints that Zu, = tr(Z;P(/) = dn,. Introducing Lagrange multipliers rji, we are led to the variational problem 

= tr(CZiCZ() - ^ r]i, [tr(Z(P,0 - Sw] = minimum. (97) 
I' 

Demanding that 6^i = for arbitrary variations 6Zi of the unknowns Z; gives the relation 

2(CZiC) =^r7,,P,, or Zz = i ^ (C-^PrC-^) . (98) 
I' I' 

To find the multipliers rji, that render tr(ZjPj//) = Su,, we multiply eq. (98) by P;// and take the trace: 

J2 ni'Fvi" = tr(ZiPi„) = Su', or r]i, = F"/. (99) 
Upon substituting eq. (99) into eq. (98) we obtain the final result 

which is identical to eq. (83). This argument, due to Tegmark (1997), shows that the maximum likelihood estimator (82) is 
the best unbiased quadratic estimator, in the sense (93). 



6.6 To bin or not to bin 

The maximum likelihood method as described above is applicable only to measurements d that cover most of the sphere, e.g., 
to spacecraft surveys of the whole-sky CMB temperature field with a relatively narrow galactic cut. For smaller regions the 
method fails because the degree-by-degree Fisher matrix Fn, is too ill-conditioned to be numerically invertible. Fundamentally, 
this is due to the strong correlation among adjacent spectral estimates Sf^^, S^^ within a band of width \l' — l\ « {1-2} x pe, 
where as before pe is the degree of the spherical harmonic that just fits a single asymptotic wavelength into the region 
of dimension Q « (2A/7r)^/'^. In view of this strong correlation it is both appropriate and necessary to sacrifice spectral 
resolution, and seek instead the best unbiased estimates S^^ of a sequence of binned linear combinations of the individual 
spectral values Si , of the form 

Sb = J2WbiSi. (101) 

I 

We shall assume that the bins B arc sufficiently non-overlapping for the non-square weight matrix Wbi to be of full row 
rank, and we shall stipulate that every row sums to unity, i.e. Wbi = 1, to ensure that {db^) = S \n the case of a white 
spectrum, Si = S. Apart from these constraints, the weights can be anything we wish; e.g., a boxcar or uniformly weighted 
average Wbi = &i^bI X^i'gs' "^^here (5;gs is one if degree I is in bin B and zero otherwise, and the denominator is the width 
of the bin. 

Because we must resort to estimating band averages Sb we are obliged to adopt a different statistical viewpoint in 
the maximum likelihood estimation procedure; specifically, we shall suppose that Si can be adequately approximated by a 
coarser-grained spectrum, 
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SI=Y.^ibSb, (102) 

B 

where Wig is the Moore- Penrose generahzed inverse or pseudoinverse of the weight matrix Wbi (Strang 1998). Because Wbi 
is of full row rank, W^g is the purely underdetermined pseudoinverse, given by 

^iB = Yl (Y1 ^b'v W^/b j , (103) 

where — Wgi and the second term is the inverse of the enclosed symmetric matrix (Mcnkc 1989; Gubbins 2004). The 
coarse-grained spectrum (102) is the minimum-norm solution of eq. (101) with no component in the null-space of Wbi', in 
other words, Sj is the part of Si that can be faithfully recovered from the binned values Sb- Since W^g in eq. (103) is a right 
inverse of Wbi, i.e. WgiW^g, = Sggi , the spectra S\ and Si have identical binned averages, Sjj = WgiS] — Sb- For 
the simplest case of contiguous, boxcar-weighted bins. Wig = {Si^b)"^ so that S] is a staircase spectrum, constant and equal 
to Sb in every bin B. 

The coarse-grained spectrum Sj gives rise to an associated, coarse-grained representation of the data covariance matrix 
C in eq. (35), namely 

C+ = St Nt = J^iSj + N^)Pi = Y'^Sb + Nb)Pb, (104) 

l B 

where Nb and iV/ are defined in terms of Ni by the analogues of eqs (101)-(102), and where the vector Pb = dC^ /OSb is 

-^i:(i;)(g)=i:-v. 

To estimate the binned spectrum (101) we consider a new likelihood function C{Sb, d) of the form (70) but with replaced 
by the coarse-grained inverse matrix C^^, and minimize by differentiating the log likelihood L{Sb,A) = — 21ni2(S's, d) with 
respect to the unknowns Sb - Every step in the derivation leading to eq. (82) can be duplicated with the degree indices I and I' 
replaced by bin indices B and B'; the resulting maximum likelihood estimate of Sb is 

ST = d'^Zsd - tr(NtZs), (106) 

where 



Z 



B 



\y,Fgl.{C-'PB'C-') (107) 



2 ^ 

B 



and 



2 \ dSB BSb' 

Upon utilizing eq. (105) we can express the band-averaged Fisher matrix (108) in terms of the generalized inverse (103) and 
the original unbinned Fisher matrix (78) in the form 

Fgg,=Y,Wl'iFii,Wlg„ (109) 

IV 

where VF^f = W^g. Eq. (106) is an unbiased estimator of the averaged quantity (101), i.e. {o^^) = Sb, by an argument 
analogous to that in eq. (84), and the covariance of two binned estimates is the inverse of the matrix (108)-(109), 

i:Tb,=co^{sT,s^}^)=f-1,, (110) 

by an argument analogous to that in eq. (85). The spacing of the bins B renders the matrix Fggi in eqs (108)-(109) 
invertible, enabling the quadratic estimator (106) to be numerically implemented and the associated covariance (110) to 
be determined. An argument analogous to that in subsection 6.5 shows that the resulting estimate is minimum-variance, 
i.e. Yaj:(S^^) — Fgg < var(5's) for any Sb satisfying (Sb) = Sb- In the case of contiguous, boxcar-weighted bins the 
band-averaged Fisher matrix (109) is simply Fggi = ^i^g '^ii^gi Pw ■ 



6.7 The white album 

The original unbinned maximum likelihood estimate (82) can be computed without iteration in the special case that the signal 
and noise are both white: Si = S and Ni = N . Even for a region R ^ Q,, the pixel-basis data covariance matrix can then be 
inverted: 

C = (S-|-iV)^P( = (An)-'(S'-|-iV)l so that C"' = An(S'-|-Af)-M. (Ill) 

I 
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The Fisher matrix obtained by substituting eq. (Ill) into (78) is related to the periodogram coupling matrix ol (56) by 



so that the matrix defined in eq. (83) is given by Zi = {4-K/A){AQ,fJ2i'^u^i^^' + l)~^Pi'- Inserting this into eq. (82) 
and comparing with eq. (57) we find that the maximum likelihood estimator coincides with the deconvolved periodogram 
estimator (68): Sf^^ = Sf^ if Si = S and Ni = N. The covariance computed using eq. (69) likewise coincides with the 
maximum likelihood covariance (85): 



noise Ni = N. 

6.8 Pros and cons 

Weighed against its highly desirable minimum-variance advantage, the maximum likelihood method of spectral estimation 
has a number of significant disadvantages: 

(i) It is intrinsically nonlinear, Sf^^ = /(d. Sf^^), requiring a good approximation to the spectrum Si to begin the iteration, 
and such a good initial guess may not always be available. It is critical to start in the global minimum basin since the 
Newton-Raphson iteration (79) will only converge to the nearest local minimum. 

(ii) Particularly for large data vectors d — (di d2 ■ ■ ■ dj)"^ , computation of the inverse data covariance matrix and the 
matrix products in eq. (79) can be a highly numerically intensive operation. The number of pixels in the WMAP cosmology 
experiment is J « 3 x 10® at five wavelengths (Gorski et al. 2005), and P;,P;/,C and are all non-sparse matrices. The 
nearly complete (80-85%) sky coverage enabled the WMAP team to develop and implement a pre-conditioned conjugate 
gradient technique to compute the three ingredients needed to determine the estimate S^^ and its covariance S^^/^, namely 
d"^(C"^P;C"^) d, tr(C"^Pi) and tr(C"^PiC"^Pi/) (Oh ct al. 1999; Hinshaw et al. 2003). Computational demands continue to 
increase: the upcoming PZ/j4A'^Cii' mission will detect J « 50 x 10® pixels at nine wavelengths (Efstathiou et al. 2005). 

(iii) Maximum likelihood estimation of individual spectral values Si is only numerically feasible for surveys such as WMAP 
that cover a substantial portion of the sphere; for smaller regions the method is limited to the estimation of binned values of 
the spectrum Sb-, and it is necessary to assume that the true spectrum Si can be adequately approximated by a coarse-grained 
spectrum S] that can be fully recovered from Sb- Even when A « 47r it may be advantageous to plot binned or band-averaged 
values of the individual estimates, because var(S';'^"^) may be very large, obscuring salient features of the spectrum. 

The multitaper method — which we discuss next — is applicable to regions of arbitrary area < ^ < 47r, does not require 
iteration or large-scale matrix inversion, and gives the analyst easy control over the resolution-variance trade-off that is at 
the heart of spectral estimation. 



7 MULTITAPER SPECTRAL ESTIMATION 

The multitaper method was first introduced into 1-D time series analysis in a seminal paper by Thomson (1982), and has 
recently been generalized to spectral estimation on a sphere by Wieczorek & Simons (2005, 2007). In essence, the method 
consists of multiplying the data by a series of specially designed orthogonal data tapers, and then combining the resulting 
spectra to obtain a single averaged estimate with reduced variance. In 1-D the tapers are the prolate spheroidal wavefunctions 
that are optimally concentrated in both the time and frequency domains (Slepian 1983; Percival & Walden 1993). We present 
a whirlwind review of the analogous spatiospectral concentration problem on a sphere in the next subsection; for a more 
thorough discussion see Simons et al. (2006). 

7.1 Spherical Slepian functions 

A bandlimited spherical Slepian function is one that has no power outside of the spectral interval < Z < L, i.e.. 




(112) 




(113) 



The deconvolved periodogram Sf^ is thus the best unbiased estimate of a white spectrum St = S contaminated by white 



L 




(114) 



Im 



but that has as much of its power as possible concentrated within a region R, i.e.. 



A = 




= maximum. 



(115) 
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Functions (114) that render the spatial-basis Rayleigh quotient in eq. (115) stationary are solutions to the (L + l)^ x (L + l)^ 
algebraic eigenvalue problem 



L 



^'^Di^^ir^iQl,^, = Xgim, (116) 
I'm' 

where Di^ ii^i = D*,^, ,^ are the spectral-basis matrix elements that we have encountered before, in eqs (22) and (52). The 
eigenvalues, which are a measure of the spatial concentration, are all real and positive, A = A* and A > 0; in addition, the 
eigencolumns satisfy gi -m = (— l)'"?*?^ so that the associated spatial eigenfunctions are all real, g(r) = g*{r). 

Instead of concentrating a bandlimited function g{r) of the form (114) into a spatial region R, we could seek to concentrate 

a spaceli'tn/ited function, 

h{r) = "^himYim{r) where him = / Yi*m{r) h{r) dft, (117) 
that vanishes outside R, within a spectral interval < / < L. The concentration measure analogous to (115) in that case is 

L 

A = = maximum. (118) 

Im 

Functions (117) that render the spectral-basis Rayleigh quotient (118) stationary are solutions to the Ftedholm integral 

eigenvalue equation 



D{r,r')h{r')dn' = Xh{r), r £ R, (119) 

R 

where 

L L 

D{r, r') = ^'"^ W ^'-('•') = II(2i + 1) P,(r • r'). (120) 

Im I 

In fact, the bandlimited and spacelimited eigenvalue problems (116) and (119) have the same eigenvalues A and are each 
other's duals. We are free to require that h{r) and g{r) coincide on the region of spatial concentration, i.e., h{r) = g^(r) or, 
equivalently, 

L 

hini = Y^Dim,i'm'gi'm.', < I < 00 , -l<m<l. (121) 

I'm' 

We shall focus primarily upon the bandlimited spherical Slopian functions g{r) throughout the remainder of this paper. 

We distinguish the (L + l)'^ eigcnsolutions by a Greek subscript, a — 1,2, . . . , (L + 1)^ , and rank them in order of their 
concentration, i.e., 1 > Ai > A2 > • • • A(£_,_i-)2 > 0. The largest eigenvalue Ai is strictly less than one because no function can 
be strictly contained within the spectral band < I < L and the spatial region 7? simultaneously. The Hermitian symmetry 
Dim,i'm' = Dp^i i^ also guarantees that the eigencolumns g^im in eq. (116) are mutually orthogonal; it is convenient in the 
present application to adopt a normalization that is slightly different from that used by Simons et al. (2006), namely 

L L L 

^ imS/J, im = 47r(5c,/3 and ^ ^fla,!mAm,i'm'fl/3,i'm' = 47rAa(5a/3 (122) 

Im Im I'm' 

or, equivalently. 



/ 

Jn 



ga{r) g/sir) dQ = Ait 6ai3 and / ga{r) gg{r) dQ = ATtXaSag. (123) 



The eigenfunction gi (r) associated with the largest eigenvalue Ai is the bandlimited function that is most spatially concentrated 
within R, the eigenfunction g2(r) is the next best concentrated function of the form (114) orthogonal to gi{r), and so on. 

The sum of the (L -t- 1)^ eigenvalues is a diagnostic area-bandwidth product known as the Shannon number which we 
denote by 

(L+lf 

K= Xc=Y^^rn,lm = ^{L+lf. (124) 

a Im 

A plot of Aa versus the rank a resembles a step function, with the first K eigenfunctions 30(1") having associated eigenvalues 
Aa 1 and being well concentrated within the region R, and the remainder having associated eigenvalues Aa and being 
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well concentrated within the complementary region Q — R. The eigenvalue-weighted sums of the product of two eigencolumns 
or eigenfunctions are given exactly by 

^a9a,lm9a,l'm' = ^T^Di^^li^i , (125) 

a 

(i+l)^ L L 

^ Ac5a(r)ffa(r') =47r^^y;„(r)D;m,i'm'i^i*m'(r')- (126) 

CK Im I'm' 

Because of the steplike character of the \a versus a eigenvalue spectrum, we can approximate eqs (125)-(126) by unweighted 
sums over just the first K eigenfunctions: 

K 

^5c«,(mfla,('m' ~ 47r-D;m,i'm'i (127) 

CK 

K L L 

^Sa(r)ff„(r') « 47r^^y,^(r) Anx.i'm' Yi1^,{r'). (128) 

a Im I'm' 

Whenever the area of the region i? is a small fraction of the area of the sphere, A ^ 47r, there will be many more well- 
excluded eigenfunctions ga{r) with insignificant (Aa « 0) eigenvalues than well-concentrated ones with significant (Aa « 1) 
eigenvalues, i.e., K <^ {L -\- 1)^. In the opposite extreme of nearly whole-sphere coverage, A 47r, there will be many more 
well-concentrated eigenfunctions (?a(r) than well-excluded ones, i.e., K ^ {L + 1)^. 

The axisymmetry of a single or double polar cap enables the {L + 1)^ x (L -|- 1)^ eigenvalue problem in eq. (116) to be 
decomposed into a series of {L—m+l)x {L—m+1) problems, one for each non-negative order < m < L. More importantly, the 
matrix governing each of these smaller fixed-order eigenvalue problems commutes with a tridiagonal matrix with analytically 
specified elements and a woU-bchaved spectrum, that can be diagonalized to find the bandlimited eigencolumns ga.im instead. 
We refrain from discussing this decomposition and the associated commuting matrix here, except to note that it makes the 
accurate computation of the well-concentrated eigenfunctions S'a(r) of even a large axisymmetric region R not only possible 
but essentially trivial (Griinbaum et al. 1982; Simons et al. 2006; Simons & Dahlen 2006). 



7.2 Data availability 

Thus far, in our discussion of the periodogram and maximum likelihood estimators, we have taken the point of view that 
the available data d(r) are strictly restricted to points r within the region R. We shall henceforth adopt a slightly difiierent 
viewpoint, namely that we are willing to allow data d{r) from a narrow region on the periphery of _R. This flexibility allows us to 
use the spatially concentrated, bandlimited tapers (/^(r) rather than the corresponding spectrally concentrated, spacclimitcd 
tapers ha{r) = 5^ (r) with spherical harmonic coefficients ha,im given by eq. (121). The small amount of spatial leakage from 
points r outside of R that we accept is offset by the advantage that there is no broadband bias in the resulting multitaper 
spectral estimates, as we shall see. The use of bandlimited rather than spacelimited tapers is natural in many geophysical 
applications, where we seek a spatially localized estimate of the spectrum Si of a signal s(r). In other applications the most 
natural viewpoint may be that the only available or usable data d(r) truly arc within a specified region R; in that case, it 
is necessary to replace ffa(r) by /la(i') in many of the formulas that follow, and the associated sums over < I < L become 
sums over < / < oo. 



7.3 Single-taper spectral estimate 

The first step in making a multitaper spectral estimate is to select the bandwidth L or the Shannon number K = {A/4tt){L+1)'^ 
and compute the associated bandlimited tapers ga{r),a = 1,2, ... , (L-|-l)^ that are well concentrated in the region of interest 
R. To obtain the ath single-taper estimate Sf, we multiply the data d(r) by g'a(r) prior to computing the noise-corrected 
power: 

2 

-Y^Mlt'Ni,. (129) 
I' 

The banded single-taper coupling matrix analogous to Kui in eqs (55) and (56) is 

^- = (^)B2p + 1)G..(J S (130) 
where 



21 + 1^ L 



g^ir) d(r)YiUr) dil 



Spectral estimation on a sphere 21 

Ga,p = ^^^|5a,p.l', 0<P<L, (131) 

Q 

is the power spectrum of the bandlimited taper ga{Tc). In the pixel basis eqs (129)-(130) become 

ST = [d^Crd - tr(NGr)] , (132) 

where Gj* is the J x J symmetric matrix with elements given by 

'21 + 1 



(Gf),,, =ff„(r,) 



'^Yim{rj)Yil,{rj,) 



9cirj,) = (^4^) 9a{rj)Pi{rj ■ r,/)ffc«(r,/). (133) 



The expected value of the ath estimate (129) is 

{Sn = ^^[tr(CGr)-tr(NGr)] 

(AQf . , . 

= ^- — ^trlSGi I noise bias cancels 

2/ + 1 ^ ' ^ 

I' 

= J2mS,Si>. (134) 

I' 

To verify the final step in the reduction (134) and thereby confirm that the pixel-basis product 

Mw = ^tr(GrP,) (135) 

is identical to the siriglc-tapcr coupling iricitrix in cqs (130)~(131), W6 traiiisforin to tli6 spaticil basis ciiid, replacs hpq — *■ Qa.,pq 
in the argument leading to eq. (54), to obtain the result 

Every row of the matrix M^, sums to unity, 

H = h ^^^^ + ^"'^ I ^"^""^ = ^' ^^^^^ 

by virtue of the Z-j identity (59). This is why we introduced the 47r normalization in eqs (122) and (123): to ensure that a 
single-taper spectral estimate S" has no leakage bias in the case of a perfectly white spectrum: [Sf) = S ii Si = S. 

7.4 Multitaper estimate 

A multitaper spectral estimate is simply a weighted linear combination of single-taper estimates, of the form 

a a 

The expected value of the estimate (138) is 

{Sf^) = ^Mii,Si, where Mu, =^CcMri, (139) 

V a 

is the multitaper coupling matrix. The constraint that the weights Ca in eq. (138) sum to unity guarantees that 

^ Mw = 1 so that {Sf^'^) =S if Si=S. (140) 

V 

Apart from this constraint, the weights are at our disposal. Two simple choices are eigenvalue weighting of all (L + 1)^ tapers, 
Cc=K-^\o., Q = 1,2,...,(L + 1)', (141) 
or equal weighting of only the first K tapers. 
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where K is the Shannon number (124). We expect the two choices (141) and (142) to lead to nearly identical spectral 
estimates Sf"^^ for the same reason that eqs (127)-(128) are a good approximation to eqs (125)-(126). Eigenvalue weighting 
has theoretical advantages, enabling us to obtain a more succinct expression for the multitaper coupling matrix and covariance; 
however, uniform weighting of only the first K tapers is, in practice, the best way to compute an actual spectral estimate 
Sf^'^ , for reasons of efficiency. Truncation at the Shannon number K retains only the bandlimitcd tapers (/a(r) that are well 
concentrated within the region R, so that Sf^'^ can be viewed as a spatially localized estimate of the spectrum Si. 



7.5 Leakage bias 

The eigenvalue- weighted power spectrum of all {L + 1)^ tapers ga{r) is simply 

^cGc,p = ^^^"^ Dpq,pg = Pp(l) dn = A for all < p < L, (143) 
by virtue of the identity (125). Because of this, the multitaper coupUng matrix in eq. (139) reduces to 

"-(!wt(^.-i)(^ lit a«) 

p 

It is remarkable that this result depends only upon the chosen bandwidth L and is completely independent of the size, shape 
or connectivity of the region R, even as R — Eq. (144) is strictly valid only for eigenvalue weighting (141) but, as just 
noted, we expect it to be a very good approximation for uniform weighting of the first K tapers (142) as well. For 1,1' ^ L 
we can use the 3-j asymptotic relation (19) to approximate (144) further by 

L 

^'''^ (ITl)lE[^H^-^'l('r/2)]'. (145) 

p 

This shows that for large / we expect M;;/ to take on a universal shape that depends only upon L and the offset from the target 
degree |/'— /|. Both the exact asymmetric relation (144), as we have seen before, and the symmetric large-Z approximation (145), 
by the spherical harmonic addition theorem, satisfy the constraint (140). 

In Fig. 6 we illustrate the variation of the coupling matrix M;;' versus the column index < < 100 for various target 
degrees 1 = 0, 10, 20, 30, 40, 50 and two different bandwidths, L = 20 and L = 10. A major advantage of the multitaper method 
is the easy control that it affords over the spectral leakage and resolution; the coupling is strictly confined to the interval 
\l' — 1\<L, of width L + min {I, L) + l, regardless of the size, shape or connectivity of the region R. The "triangular" coupling 
to the monopole degree I — is, by virtue of (18), exactly described by the relation Mo;/ — {21' + 1)/{L + 1)^, < I' < L; 
i.e. the degree-zero estimate Sq*^ is really an estimate of the total power within the band < I' < L. As the target degree 
I increases the coupling matrix Mui increasingly takes on a domelike universal shape that is approximately described by 
eq. (145). Fig. 7 shows a plot of this large-/ limit for four different bandwidths. L = 5, 10, 20. 30; the abscissa is the offset from 
the target degree, I' — I, which is confined to the closed interval [—L, L]. Rouglily speaking the shapes are all scaled versions 
of each other; recall that the height of the 2L + 1 bars in every graph must sum to one hundred percent. 



7.6 Multitaper covariance 

The covariance of two multitaper estimates (138) is a doubly weighted sum over all of the single-taper cross-covariances: 



T,ii, =COV[Sl ,6,/ ) = } ^CaS„r C/}, 

a/3 

where, as usual via the Isserlis identity (43), we have 
E-=cov(5r,5,^)=^^^^0^tr(CGrCGl^). 

Transforming to the spatial basis as in the derivation of eq. (67) we obtain 

^a/3 _ 2 



{2l + l){2l 
or, equivalently, 

-y 



J2{Sp + Np)f ff„(r)y;,(r)F;^(r)dn / gg{r')Y^,{r')Yi1^,{r') dQ' 

tutu' nn 



mm pq 



T-iap 



Y^{2p + l){Sp + Np)J2Yl V(2s + l)(2s' + l)5,,,t 
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(146) 
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(149) 



Spectral estimation on a sphere 23 



6 




1 = 50 



20 40 60 80 100 
degree 1' 



3 




1 = 50 



degree 1' 



Figure 6. Bar plots of the multitaper coupling matrix 100 x Mni for bandwidths L = 10 (top) and L = 20 (bottom). The (occasionally 
obscured) tick marks are at I' = 0, 20, 40, 60, 80, 100 on every offset abscissa; the target degrees / = 0, 10, 20, 30, 40, 50 are indicated on 
the right. The height of each bar reflects the percent leakage of the power at degree I' into the multitaper estimate S^"^ , in accordance 
with the constraint (140). Small numbers on top are the maximum value of 100 X M;;/ for every target degree I. 




Figure 7. Large-Z limits of the multitaper coupling matrix 100 X M;;/, plotted versus the offset /' — I from the target angular degree, 
for bandwidths for L = 5 (top left), L = 10 (bottom left), L = 20 (top right) and L = 30 (bottom right). The limiting shapes 
were found empirically by increasing I until the plots no longer changed visibly. The slight asymmetry reflects the inaccuracy of the 
approximation (145); the exact coupling matrix (144) is asymmetric because of the leading factor of 21' + 1. 



It is noteworthy that E^? = S^^ and S"'' = Sf"; however, it is not in general true that E^f = E^f = Sf°. Eqs (148) and 
(149) show that every clement of the multitaper-covariancc matrix is positive, > 0, as long as the weights are positive, 

Ca > 0. We shall henceforth limit attention to eigenvalue weighting, Ca = K^^Xa,a = 1,2, ...,(L + 1)^. The eigenvalue- 
weighted multitaper covaxiance E^,'"' can be written in a relatively simple approximate form in the case of a moderately 
colored spectrum, as we show in subsection 8.3. 
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7.7 Bias and mean squared error 

The bias of an eigenvalue-weiglited multitaper estimate Sf'^'^ = K^^ XaS^ is the discrepancy between its expected value 
and the true spectrum: 

bias {SD = ) -Si = Yl {Mn, - Sp. (150) 

\V-l\<L 

The bandlimited character of the tapers ga(r), Q = 1, 2. .... (L + 1)^ ensures that the bias is purely local; there is no broadband 
bias from harmonic degrees /' outside of the coupling interval |/' — /| < L. If the spectrum is not highly colored within this band, 
in the sense S;/ w Si, the bias will be small: X^|;/_;|<j^ {Mu, — Sw) Sii m Si X3|;/_(|<j;, (Af;;' — ^w) = 0, by virtue of (140). The 
total estimation error is given by Sf^"^ — Si and the mean-squared error is the expectation of the square of this: 

n^e{Sr)={{Sr-Si)'). (151) 

As is true for any estimate (e.g., Cox & Hinkley 1974; Bendat & Piersol 2000), the mean-squared error is the sum of the 

variance and the square of the bias: 

mse {Sf") = var (sf") + bias^ [sf") . (152) 

In CMB analyses the bias of Sf''^ is not a particularly critical issue because the ultimate objective (e.g., Jungman et al. 1996) is 
to determine ~10 cosmological parameters that characterize the inflationary universe (the baryonic-matter, cold-dark-matter 
and dark-energy densities Qb, JIa; the Hubble constant Ho, etc.) and this downstream estimation can be grounded upon 
estimates of either Si or MuiSii as long as the coupling matrix Mui is known. 



8 MODERATELY COLORED SPECTRA 

Eq. (149) and the analogous expression for the periodogram covariaiice, cq. (67), arc lengthy and therefore difficult to evaluate 
numerically; in this section we derive simpler expressions for Sf,?, Sjf/^ and the Fisher matrix Fui that should be good 
approximations for moderately colored spectra, for which it is permissible to replace 

Sp + Np^ s/ {Si + Ni){Si, + Ni,) (153) 

in equations such as (66) and (148). We write the resulting approximations using an = sign rather than an « sign, even 
though they are all strictly valid only in the case of a white signal contaminated by white noise: Si = S and Ni = N. 



8.1 Periodogram covariance 

Upon making the substitution (153) into eq. (66) and making use of the first of the identities in eq. (23), we obtain 

= {21 + 1X2/' + 1) +iV,0 (154) 

mm' 

or, via eq. (56) , equivalently, 

=^[^)\si+Ni){Si,+Ni,)Y,{'2p + 1)bJ[ o) -^-^iSi + Ni){Si,+Ni,){2l' + l)-'Kii,. (155) 

The covariance (155) for a moderately colored spectrum will be a better approximation for a large region, A m A-n, than for a 
small one, A -C 47r, because the extent of the coupling Kh, and thus the bandwidth over which the variation of the spectrum 
must be regarded as moderate increases as the size of the region R shrinks (see Fig. 4). In the limit (62) of a vanishingly small 
region, the signal and noise must be completely white. Si = S and Ni = N, in order for eq. (155) to be useful, and in that 
limit Bp -> A'^/{4'k) so that T,f^ -» 2(5 + NfSw, following eq. (59). 



8.2 Fisher matrix 

The inverse of the pixel-basis data covariance matrix C can be approximated in the case of a moderately colored spectrum (153) 
by a simple generalization of the exact result for a white spectrum, eq. (Ill): 

C- = (156) 
^{Si+Ni){Si,+Ni,) 

Upon either inserting this into eq. (78) or — as can be derived from eq. (153) with eqs (91) and (23) or via eqs (87) and (22) — 
the equivalent spectral-basis approximation 



V, 



into eq. (86), we obtain a compact approximate formula for the Fisher matrix 
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(157) 
(158) 



or, equivalently, 



Fni = 



(159) 



The result (159), which is due to Hinshaw et al. (2003), will also be more accurate for a large region than for a small 
one; in the limit of whole-sphere coverage, Bp AirSpo and Khi 5w so that SfJ 2{2l + l)~^{Si + NiYSw and 
Fw 1 (2' + l)(>S'i + Ni)~^ Sw , in agreement with eqs (45) and (92). Per (85), the maximum likelihood covariance Ejf,'" = F~,^ . 



8.3 Multitaper covariance 

The assumption that the spectrum is moderately colored is less restrictive for a multitaper spectral estimate Sf"^ than for 
a pcriodogram estimate Sf^, because the coupling Mui is confined to a narrow band, of width L + min(/,L) + 1, that is 
independent of the size, shape or connectivity of the region R. Upon modifying eq. (148) with eq. (153) and using eq. (9) we 
can write the cross-covariance of two single-taper estimates in the form 

„^ _ 2{Si+Ni){Si, +Ni,) ^ " ' 



(2Z + 1)(22' + 1) 



mm' 



/ fla(r)fl/3C 

Jn 



(160) 



where we have used the representation (9) of the Dirac delta function to reduce the two integrals inside the absolute value 
signs to one. Upon utilizing the spherical harmonic product identity (12) and evaluating the sum over m and m' using eq. (15) 
as in the derivation (51)-(54), we can reduce eq. (160) to 



(161) 



Substituting the representation (114) of S'a(r) and g/3{r) and using eq. (11) we can write eq. (161) in the convenient form 
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where we have defined the quantities 
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(163) 



(164) 



It is noteworthy that all the symmetries E^f — E"^ = Tjf" pertain in this moderately colored approximation. The eigenvalue- 
weighted multitaper covariance is given by a fornmla analogous to eq. (162), namely 

sr = ^(^i + iV0(^r + iv,)^(2p + i)rJ J J 

p 

where 

^ 1(2 y, AaTp'^A/J. 



(165) 



afj 



Upon using the identity (125) to express the double sum in eq. (165) in terms of -Dst^s'f and -D„„_u't,' and then using the 
boxcar window function (47) to express these matrix elements as integrals of three spherical harmonics over the whole sphere 
f2, to be reduced using eq. (11), we obtain a fivefold sum over the order indices t,t',v'v' and q, which can be reduced with 
the aid of eq. (16), leading to the relatively simple (and efficiently computable) result 
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(166) 
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where Be is the boxcar power, which depends on the shape of the region of interest, summed over angular degrees hmited by 
3-j selection rules to < e < 2L. The sums in eqs (162) and (164) are likewise limited to degrees < p < 2L, inasmuch as 
Tp^ = and Fp = for p > 2L. The effect of tapering with windows bandlimited to L is to introduce covariance between the 
estimates at any two different degrees I and I' that are separated by fewer than 2L + 1 degrees. 



8.4 Whole-sphere and infinitesimal-area limits 



It would obviously be perverse to contemplate using the multitaper method in the case of whole-sphere coverage; we neverthe- 
less present an analysis of the A — > 47r limit of the covariance 'S^^ in the interest of completeness. In that limit Be —^ 47r5eO, 
and both eqs (18) can be used to reduce eq. (166) to 



47r 



L 



(i+i).^(2.+i)(2.'+i)( I J ; 



(167) 



and thereby the multitaper covariance (164) to 
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(168) 



If the same band-averaged quantities '^i,Mii,Si' are estimated using the maximum likelihood method with whole-sphere 
coverage, the covariance in the moderately colored approximation (153) is 
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In fact, eqs (168) and (169) are identical by virtue of the 3-j identity 
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where we have used the Legendre product identity (13), and the representation (10) of the Dirac delta function (5(/i — /x') to 

reduce the double integral in the second lino. The above argument shows that the eigenvalue-weighted multitaper estimate 
Sf^'^ is the minimum- variance unbiased estimate of the averaged spectrum MiifSif in the limit R — fl. In practice, if we 
should ever be blessed with whole-sphere coverage, it would be easiest to compute this minimum-variance spectral estimate 
by simply forming a weighted average of the whole-sphere estimates (37)-(38). As we have just shown, eq. (169) specifies the 
covariance of such an estimate. 

Recalling that Be /{A-k) in the opposite limit of an infinitesimally small region and making use of the identity (17), 

we find that eq. (166) reduces to 
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(171) 



where we note that F^^*^ — Att. The resulting infinitesimal-area limit of the multitaper covariance S^/^ for a fixed bandwidth 
L is again of the form (164), with Fp replaced by its limiting value (171). If the Shannon number K = (A/47r)(L -I- 1)^ 
rather than the bandwidth L is held constant in taking the limit A ^ 0, then the multitaper coupling matrix (144) tends to 
Mil, ~* ^^^^(^/47r)(2i' -|- 1), i.e. all degrees across the entire spectrum are coupled. Both the signal and the noise must then be 



white for the limiting covariance, Ej, 



2{S + N) , to be a reasonable approximation. The latter can be derived by noting 



that, in taking the limit as prescribed by eq. (62) and using eq. (59), the Qxed-K result is Fp^ = 47r rather than (171). 
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9 SPECTRAL SHOOTOUT 

In this section we conduct a numerical variance analysis of the various estimates Sf^ , Sf^, S'f'"" and Sf^. We use the 
variance (46) of the whole-sphere estimate as a standard of comparison, computing the variance ratio 

(cT; ) = var(6j )/var(6j ) = / E;; (172) 

where XX stands for any of the acronyms SP, DP, ML or MT. The numerators in cq. (172) arc computed using the moderately 
colored approximations for E^'' derived in section 8. This has the advantage that a common factor of {Si + Ni)'^ cancels, 
leading to ratios (crf)^^ that are independent of the signal and noise spectra Si,Ni. Although the results we exhibit should be 
reasonable approximations for moderately colored spectra, they are only strictly correct in the case of a white signal, Si = S, 
contaminated by white noise, Ni = N. 



9.1 Variance of a periodogram estimate 

Fig. 8 shows the variation with degree I of the spherical-periodogram variance ratio, 

-(^)(t)"i:<^^-)«'(S s o)'' 

p 

for single and double polar caps of radii — 3°, 4°, 5°, 7°, 10°, 20°, 60°. The summation index p is limited by 3-j selection 
rules to even values, with the result that cq. (173) yields identical results for a single and double cap of the same radius Q, by 
virtue of the relations (50) and ^°"' — 2^°^^; stated another way, each double-cap estimate Sf^ averages over half as many 
adjacent degrees I' with a weighting Kui that is twice as large. The monopole variance ratio is (cq)^^ = 1 regardless of the 
cap size ©, but as the harmonic degree increases the variance ratio does as well, reaching a maximum at / m 6O°/0 and then 
oscillating mildly before eventually leveling off at a large-Z limit given by 

{Tlf = % ^(2p + 1) Bp [Pp{Q)f , (174) 
p 

where 

^p(O) = I pi 2-f [(p/2)!]-2 if p is even ^^'^^^ 

is the value of the Lcgcndrc polynomial of degree p at the argument /i = 0. The oscillatory interval is wider for small regions, 
A <C 47r, than for large ones, A ~ 47r. As expected, the high-degree variance (174) is greater for a smaller single or double 
cap, e.g., (<T%,)^^ = 12.3 for = 5° versus (cr^)^^ = 6.2 for G = 10°, because there are fewer pixelized data available to 
constrain the estimate Sf^ . A useful empirical approximation to eq. (174) for 6 ^ 65° is (ct^)^^ « 0.54 (47r/^°''P)^/^, which 
can be read off the right axis. In the limiting case of an infinitesimally small area, j4 — » 0, the variance is divergent; in fact, 
letting Bp A^/(47r) in eq. (173) we find that (aff^ 2Z + 1 for all < / < oo. 



9.2 Variance of a maximum likeliiiood estimate 

The maximum likelihood estimate Sf^^ and the deconvolved periodogram estimate coincide in the case Si = S and 
Ni = N , as we showed in subsection 6.7, and their common variance ratio is given by 

{cri ) =i(Ti) = [■^J , (176) 

To evaluate the ratio (176) we must compute and invert the boxcar coupling matrix Kui of eq, (56), taking care to avoid 
truncation effects from large values of / and V . Fig. 9 shows the variation of (erf )^"" = {(Tf)^^' with degree / for four double 
polar caps with radii G > 75°. For double caps that cover less of the sphere, the matrix Kui is too ill-conditioned to 
be invertible, and neither maximum likelihood estimation (82) nor deconvolution (68) of the periodogram estimate Sf^ is 
numerically feasible. As expected, the maximum likelihood variance is larger than the undeconvolved periodogram variance, 
e.g., (cr^)^*'^ = (<T^)^^ ~ 1.75 versus (u^)^^ ~ 1.05 for a double cap of radius O = 75°, because the averaging of the 
periodogram degrades the spectral resolution but improves the variance. In the limit of nearly whole-sphere coverage the 
maximum likelihood variance ratio can be approximated by (crf)'^'" = (a"?)°^ « (47r/A)^ shown on the right axis; i.e., the 
standard error is increased relative to that of a whole-sphere estimate by roughly the reciprocal of the fractional area of the 
region where there is data. This result can be derived by substituting the approximation Bp ~ {A^/'iTT)dpo in eq. (56) and 
using eq. (18). At whole-sphere coverage, A = 47r, and we obtain (erf )^'" = {af )°^ = 1, as expected. 
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Figure 8. Black dots connected by black lines show the pcriodogram variance ratio (fff)^^ as a function of degree < I < 50 for 
single and double polar caps of radii © = 3°, 4°, 5°, 7°, 10°. 20°. 60°. Grey iiorizontal lines labeled along the left vertical axis show the 
large-Z limits (o"^)^^. The open circle is the common mouopolc variance ratio (o"q)^^ = 1; the diagonal grey line is the infinitesimal-area 
hmit (a-f)^^ — > 2Z + 1. Labeled tick marks on the right show the approximation (o-^^)^^ « 0.54 (47r/A'^^P)'^/^. It is noteworthy that 
(cr^)^^ > (fg)^^ for all 0, with equality prevailing only in the limit = 90°: half-sphere coverage with a single cap yields the same 
variance as whole-sphere coverage. 
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Figure 9. Black dots connected by black lines show the maximum likelihood variance ratio (a^)^^ = (crf)^^ as a function of angular 
degree < i < 50 for double polar caps of radii = 89° , 85° , 80° , 75° . The ratio for a = 90° double "cap" is obviously unity (grey 
horizontal line). Labeled tick marks on the right show the nearly-whole-sphere approximation {(rf)^^ f» i'^f)^^ ^ (47r/yl)^. The slight 



downward "dimple" between I = 1-5 for = 75° is possibly an incipient numerical instability; attempts to invert the matrix Kni 
wider equatorial cuts lead to increasingly unstable results. 



for 



9.3 Variance of a multitaper estimate 

Fig. 10 shows the variation with harmonic degree / of the eigenvalue-weighted multitaper variance ratio, 

for single polar and double polar caps of various radii and for two different bandwidths, L = 10 and L = 20. The lowest 
variance for any region R and any bandwidth L is that of the monopole or / = harmonic, given by any of the three equivalent 
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expressions that are easily derived from eqs (177) and (166) using eqs (18), (56) and (124): 

e ss' St s'f/ a 

In the limit of whole-sphere coverage (ctq)^^ = 1/{L + 1)^, which is easiest to see by noting that in that case, eqs (22) and (7) 
show that Dst,s't' = Sss'Stt' ■ In the opposite limit of an infinitesimal area, Vq^° = Att due to eq. (171), and {(Jq)^'^ = 1, the 
largest possible monopolc variance ratio. No matter where it starts, the variance ratio (crf)^'^ increases as the target degree / 
increases, always reaching a maximum at I « 0.65L before decreasing equally quickly to an Z 3> i asymptotic limit given by 

2L 

(^-)''^ = ^ E(2P + 1) [PAO)f ■ (179) 

p 

The whole-sphere limit of eq. (179) is indicated by the four open circles in Fig. 10. Both this and the infinitesimal-area limit, 
which is off-scale in all four plots, are easily computed by respectively substituting Fp^*'^ from eq. (167) and Tp^° from 
eq. (171) into eq. (179), thereby avoiding the computation of the Wigner 6-j symbols needed for the more general Fp in 
eq. (166) or the even more cumbersome route through eqs (163) and (165). 

Fig. 11 shows the large-Z variance ratio {(j'^)^'^'^ plotted versus the bandwidths < L < 20 for single polar caps of 
various radii 0° < < 180° and double polar caps of various radii 0° < © < 90° . In the degenerate case L = 0, bandlimited 
"multitaper" estimation is tantamount to whole-sphere estimation so (cr^)*^^ = 1 regardless of the "cap" size 0. Indeed, in 
that case, the estimate is unbiased, Mni = Sui, and at L = 0, the single possible taper of the form eq. (114) is a constant over 
the entire sphere. For sufficiently large regions (6 ^ 30° for a single cap and G ^ 15° for a double cap) the large-Z variance ratio 
is a monotonically decreasing function of the bandwidth L; for smaller regions the ratio attains a maximum value [cf'^)^'^ > 1 
before decreasing. The grey curves are isolines of fixed Shannon number K = {A/A'k){L + 1)^; it is noteworthy that the K = 1 
isoline passes roughly through the maxima of (cr^)*^^, so that for K > 2-3 the variance ratio is a decreasing function of the 
bandwidth L regardless of the cap size. Since K is the number of retained tapers, it will always be greater than 2-3 in a 
realistic irmltitaper analysis. For large Shannon numbers, above K ~ 10, the dependence upon the bandwidth L and area A 
for both a single or double cap can be approximated by the empirical relation (cr^)^^ « {An / A)'^'^^ / {2L + 1). In particular, 
if j4 = 47r, the large-Z variance ratio is to a very good approximation equal to one divided by the number of adjacent degrees 
I — L<l'<l + L that are averaged over by the coupling matrix Mu/ . As noted in section 8.4, a whole-sphere multitaper 
estimate S^'^ can be regarded as a weighted linear combination of whole-sphere estimates of the form Mn'S^^ , so the 
variance is reduced by the number of independent random variates Sj^%, . . . , S^^, . . . , Sj^l, that contribute to the estimate. 
For smaller regions of area yl w 47r the whole-sphere variance ratio 1/(2L -|- 1) is empirically found to be increased by a factor 
(Aw/A)^'^^. In fact, it is very reasonable to approximate the nearly- whole-sphere variance ratio at large Shannon numbers by 
(af)^'^ « {4:n/A)°-^^{af)^Ii„ for all spherical harmonic degrees < I < oo. 

Finally, it is interesting to compare the large-Z variance ratio of a multitaper estimate [a^)^'^ with that of a spherical 
periodogram estimate (cr^)^^, in the case that the coupling to adjacent harmonic degrees I' is roughly the same. Referring 
to Figs. 5 and 7, for example, we see that the widths of the periodogram coupling matrices Kui for single polar caps of 
radii = 10°, 20°, 30° arc comparable to the widths of the multitaper coupling matrices M;;/ for bandwidths L — 20, 10, 5, 
respectively. In such cases the multitaper variance ratio is always less than the periodogram variance ratio by a factor that 
is close to the reciprocal of the Shannon number, i.e. (cr^)*^^ w (a^)^^ . This empirical approximation is reminiscent of 
the analogous situation in 1-D (Percival & Walden 1993). 



10 RESOLUTION VERSUS VARIANCE: AN EXAMPLE 

To illustrate the ease with which a multitaper spectral analyst can control the fundamental trade-off between spectral resolution 
and variance by altering the bandwidth L or Shannon number K — (^/47r)(L -|- 1)^, we consider a specific example in this 

penultimate section. We choose a cosmological rather than a geophysical example primarily because the CMB temperature 
spectrum Si has a readily computable theoretical shape for a specified set of cosmological parameters (Seljak & Zaldarriaga 
1996; Zaldarriaga et al. 1998; Zaldarriaga & Seljak 2000). Like many geophysical spectra the CMB spectrum is red, varying 
as Si ~ l~^ , with a number of interesting secondary features that one would like to resolve, including acoustic peaks at 

1 « 220, 550, 800 and higher. To counteract the redness it is conventional in CMB cosmology to plot not Si but rather the 
whitened spectrum 

(180) 

which is shown as the heavy black line in each of the panels of Fig. 12. The theoretical values of Si versus harmonic degree 

2 < I < 900 have boon computed for a set of nominal cosmic input parameters, including flh = 0.046, Qc = 0.224, Qa = 0.730 
and Ho = 72 kms~^Mpc~^, using the CMBFAST code that is publicly available at http://lambda.gsfc.nasa.gov. The 
monopole term <So, which is a measure of the average CMB temperature To = 2.725 K (Mather et al. 1999), and the dipole 
term <Si, which is strongly influenced by the proper motion of our galaxy relative to the CMB, are commonly omitted. The slight 
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Figure 10. Black dots connected by black lines show the variation of the multitaper variance ratio (af)^'^ with dcg rcc < Z < 50 for 
single polar caps of radii G = 15° , 20° , 30° , 60° (left two plots) and double polar caps of common radii 8 = 40° , 50° , 60° , 80° (right two 
plots). Top two plots are for a bandwidth L = 10 and bottom two plots arc for a bandwidth L = 20; the rounded Shannon numbers 
K = (A/47r)(L + 1)^ are indicated. Vertical dotted lines at / = 10 and / = 20 show that above I = L the variance ratio (cj^)^^ quickly 
reaches a large-/ asymptotic limit (u'^)^^'^ , given by eq. (179) and depicted by the grey horizontal lines labeled along the left vertical 
axis. Open circles on the right vertical axis are the whole-sphere, large-/ limits, obtained via eq. (167). 
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Figure 11. Variation of the large-/ multitaper variance ratio (c^)^^ with bandwidth < L < 20 for single polar caps of radii 
e = 0° , 10° , 20° , 30° , 40° , 50° , 70° , 100° , 180° (left) and double polar caps of common radii G = 0° , 5° , 10° , 20° , 30° , 40° , 60° , 90° (right) . 
Ranges of the Shannon number K = {A/4,n){L + 1)^ are distinguished by different symbols: open circles < K < 1, closed circles 
1 < if < 10, open squares W < K < 100, closed squares K > 100. Grey curves labeled K = 1, 10, 100 are Shannon number isolines. Axes 
are logarithmic to illustrate the 1/(2L + 1) bandwidth scaling above K Ri 10. 
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Figure 12. Heavy black lines and surrounding grey band depict the tiicorctical whitened CMB spectrum Si = l{l + 1)Si/{2it) and 
hypothetical whole-sphere WMAP estimation error [2/{2l + 1)]^''-^[<S; + l{l + l)A'^;/(27r)] as a function of angular degree in the range 
2 < I < 900. Open circles with attached error bars show the expected value and associated standard error {S^'^) ± [var(5j'^''')]-'^''^ of 
hypothetical multitapcr estimates of the whitened spectrum <S; for various bandwidths, ranging from L = 10 (upper left) to L = 60 
(lower right) . The multitapcr analysis region consists of two axisymmetric caps separated by an equatorial cut of width 20° . The angular 
scale 180°/[l(l + 1)]^^^ of the CMB temperature fluctuations is delineated along the top. 



fluctuations from point to point in the sky about the all-sky mean To are measured in /jK so the units of power Si are /iK^. The 
grey band surrounding the theoretical 5;-vorsus-/ curve is the standard error [var(>S;^^)]"'^^ = [2/(2/-|-l)]"'^'^ [5; + /(/ + l)7V;/(27r)] 
of a hypothetical whole-sky spectral estimate = l{l + 1)S^^ /{2n). The noise power Ni is assumed to be of the form (36) 
with pixelization, detector and beamwidth specifications that roughly correspond to those used in the WMAP spacecraft 
mapping experiment, namely Af2 — 4 x 10~^ sr, a — 100 /iK/pixel and ^fwhm ~ 20 arcmin. The thinning of the grey band at 
I ~ 350 represents the transition between the low-degree region where the uncertainty in a hypothetical whole-sphere WMAP 
estimate <Sj^^ is dominated by cosmic variance and the high-degree region where it is dominated by noise variance. The rapid 
increase in the whole-sky uncertainty above this transition is due to the exponential increase in the noise power (36) for 
harmonics that are below the angular resolution of the WMAP antennae. The total uncertainty [var(<S;^^)]^/^ due to both 
cosmic and noise variance represents the best we can ever do, if wc insist upon estimating individual values of the spectrum 
Si, even if we had uncontaminatcd whole-sky data. The elimination of contaminated data by a sky cut will always increase the 
variance; the only way to reduce it is to sacrifice spectral resolution. The six panels of Fig. 12 illustrate the effect of making a 
multitapcr estimate of the whitened spectrum Si, using tapers of increasing bandwidth L = 10, 20, 30, 40, 50, 60. The analysis 
region in every case is a double polar cap of common radius © = 80° , corresponding to an equatorial cut of width 20° , needed to 
mask the strong foreground contamination from the galactic plane. As wc have soon, the bandwidth alone controls the amount 
of bi£is deliberately introduced in this way, and not the size or shape of the analysis region — but the latter does influence the 
variance of the estimate. The open circles show the expected values of a multitapcr estimate (Sf^'^) = MuiSii, and the 
accompanying error bars show the associated standard error [var(5,'^"'^)]^/^ under the moderately colored approximation. The 
multitapcr method yields a band-averaged spectral estimate at every spherical harmonic degree I, but we have only plotted 
values {Sf"^) ± [var(iS;'^"'^)]^''^ whose coupling bands do not overlap, so that they are statistically uncorrelated. The spacing 
between the open-circle estimates is thus indicative of the spectral resolution. The discrepancy between the open circles and the 
heavy black Si-versMS-l curve is a measure of the local bias (150) induced by the averaging over adjacent degrees |/' — /| < L. As 
expected, the bias {Sf^'^) — Si is most pronounced in strongly colored regions of the spectrum, and it is an increasing function 
of the bandwidth L and thus the spectral extent of the averaging. For moderate values of the bandwidth, 10 < L < 40, the 
bias is acceptably small in the sense 

|(5MT^ - 5i| < .Si; in addition, the spacing between statistically independent estimates 
{Sf^'^) and the error bars ±[var(5('^^)]^/^ are sufficiently small to enable resolution of the first two spectral peaks at I w 220 
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and I ~ 550. Bandwidths in this range are therefore suitable for multitaper spectral analysis of WMAP temperature data 
on the cut sky. One can either opt for finer spectral resolution with larger error bars (L = 10) or for coarser resolution with 
somewhat smaller error bars (L = 40); to a good approximation the standard error [var(<S('^^)]^^^ scales with the bandwidth 
L as (2L + 1)^^^^ , as we have seen. Because multitaper spectral analysis does not require iteration or large-scale matrix 
inversion, it is easy to perform analyses for a variety of bandwidths in the range 10 < L < 40 and compare the results. In 
all cases the multitaper errors are significantly smaller than the uncertainty of a hypothetical whole-sky estimate of iSj, with 
no band averaging. Resolution of the CMB spectral features at higher degrees, above I w 700, will require a narrowing of the 
beamwidth ^twhm and/or a reduction in the instrument noise cr; motivated by this need and a number of other astrophysical 
considerations, both ground-based and space-based CMB experiments with narrower-aperture antennae and more sensitive 
detectors are in advanced stages of development (e.g., Kosowsky 2003; Efstathiou et al. 2005). 

11 OVERVIEW AND CONCLUSION 

Each of the spectral estimators that we have reviewed or introduced in this paper can be expressed in the general, noise- 
corrected quadratic form (94), which we repeat here for convenience: 

S-, =d"'Zid-tr(NZ,). (181) 
The expected value and the covariarice of such a quadratic estimator are given by eqs (95) and (96), which we also repeat: 

{Si)=Y^Zu,Si, where = tr(Z,P,,), and = cov(5',, 5;/) = 2tr (CZiCZ,,) ■ (182) 

V 

The specific forms of the symmetric, J x J pixel-basis matrix Z; in the various instances are 

(An)2 

whole sphere: Z; = Pi where Pi covers all of f2. 



/47r\ (Af2)^ 

spherical periodogram: Z; = I — 1 ^ — — Pi where P; only covers R, 



max;imum likelihood: 



1 1 (183) 

-^F-/(C-^P(,C-i) where = - tr(C-ip,C"^P,,) , 



2 

I 



multitaper: Z; = G; where G; = — AaG; . 

a 

In writing the final relation in eq. (183) we have assumed that the individual tapers are weighted by the normalized eigen- 
values Xa of the spatial concentration problem sensu Slepian, eqs (115)-(116). The whole-sphere and maximum likelihood 
estimates are unbiased, i.e. Zw = Sw, whereas the periodogram, with Zui = Kui given by eq. (56), and the eigenvalue- 
weighted multitaper estimate, with Zui — Mni given by eq. (144), are biased by spectral leakage from neighboring degrees 
I' I. The leakage bias of the periodogram is uncontrollable and can be extensive, particularly for small regions of area 
A <^ An, rendering the method unsuitable in applications. The extent of the multitaper coupling is in contrast confined to a 
narrow bandwidth interval \l' ~ l\ < L that is specified by the analyst. 

The covariance of a whole-sphere estimate is = 2{2l + l)~^{Si + Ni)^Siii and the covariance of a maximum likelihood 
estimate is the inverse of the Fisher matrix of eq. (86), Sjf,'^ — . In the limit of whole-sphere coverage, A — 4tv, the 
two methods coincide and var(5';*^) = 2(2/ -|- 1)~^(S'; -|- A'j)^ is the minimum possible variance achievable for any unbiased 
spherical spectral estimator. The covariance of a periodogram estimate is given by eq. (67) whereas that of a umltitaper 
estimate is given by eqs (146) and (149). For moderately colored spectra these cumbersome expressions for EfJ and 'Sf},'^ can 
be approximated by eqs (155) and (164)-(166), and the Fisher matrix Fni can be approximated by eq. (159). 

The maxinmm likelihood method is attractive and has received widespread use in CMB cosmology, because it provides 
the best unbiased estimate Sf^^ of the spectrum Si in the sense that it has minimum variance. This desirable feature is ofi^set 
by a number of disadvantages that we enumerate in subsection 6.8; specifically, it is only feasible without binning for nearly- 
whole-sphere analyses, A ~ An, and even then it requires a good initial estimate of the spectrum Si, non-linear iteration to 
converge to the minimum- variance solution Sf^^, and large-scale computation to find the inverse matrices and F~,^ . For 
smaller regions, of area A Tfe 47r, it is possible to obtain minimum-variance, unbiased estimates of a binned spectrum 
Sb ~ WbiSi using eqs (106)-(109); however, this requires the somewhat artificial assumption that the true spectrum Si 
can be adequately approximated by a coarse-grained spectrum = W^gSg, where WgiWj'g, — 5bb>- 

The multitaper method is distinguished by its case of use, requiring neither iteration nor large-scale matrix inversion. 
Unlike the unbinned maximum likelihood method, it yields a smoothed and therefore biased estimate of the spectrum, 
(S'f*^) — MiiiSii; however, the bias is generally small because it is strictly local, provided that one uses bandlimited 
rather than spacelimited spherical tapers, and the sacrifice of spectral resolution comes with an auxiliary benefit, namely a 
reduction by a factor of order (2L + 1) in the variance of the smoothed estimate, var (Sf ^). By varying the bandwidth L or 
the Shannon number K = {A/A'k){L + 1)^, a multitaper analyst can quickly navigate to any subjectively desirable point on 
the resolution-versus-variance trade-off curve. The only slight disadvantage of the method is that the shape of the matrix M;;/ 
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within the couphng band \l' — l\ < L, and thus the character of the smoothed spectrum MniSi' that one is estimating, 
cannot be arbitrarily specified. The couphng matrix Mni for an eigenvalue-weighted multitaper estimate is illustrated in 
Figs. 6 and 7. In geophysical, geodetic and planetary science applications the objective is generally to obtain a spatially 
localized estimate of the spectrum Si of a signal s(r) within a pre-selected region R of area A <^ An. The multitaper method 
with spatially woll-conccntratod, bandlimitcd tapers (?c<(r) is ideally suited for this purpose, and can bo easily extended to 
estimate cross spectra of two signals such as gravity and topography, enabling admittance and coherence analyses. The spatial 
leakage from data outside of the target region R can be quelled and the analysis expedited by averaging only the first K 
tapered estimates , as in eq. (142). 
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